Duplicated from Bedovyy/Anima-GGUF

applispee
/

Anima-GGUF

Model card Files Files and versions

Anima-GGUF / README.md

applispee's picture

Duplicate from Bedovyy/Anima-GGUF

a4b901d about 2 months ago

|

history blame contribute delete

2.7 kB

	---
	license: other
	license_name: circlestone-labs-non-commercial-license
	license_link: https://huggingface.co/circlestone-labs/Anima/blob/main/LICENSE.md
	base_model:
	- circlestone-labs/Anima
	base_model_relation: quantized
	---

	# GGUF models of ANIMA

	# How to use

	- Update ComfyUI to v0.14.1 or above. (You don't need custom script.)
	- Download GGUF you want. (Recommend at least Q5_K_M)
	- Use [ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF) custom node to load the model.




	## Generation speed

	Tested on
	- RTX5090(400W), ComfyUI with `--fast` option and `Patch Sage Attention KJ` node(AUTO).
	- 832x1216, cfg 5.0, 50steps

	\| Quant \| it/s \| Time (s) \| Speed vs BF16 (%) \|
	\| ------ \| ---- \| --------- \| ---------------- \|
	\| BF16 \| 4.65 \| 11.70 \| 0.00% \|
	\| Q8_0 \| 4.46 \| 12.07 \| -4.09% \|
	\| Q6_K \| 3.60 \| 14.91 \| -22.58% \|
	\| Q5_K_S \| 3.35 \| 15.94 \| -28.03% \|
	\| Q5_K_M \| 3.41 \| 15.67 \| -26.67% \|
	\| Q5_1 \| 3.42 \| 15.24 \| -26.45% \|
	\| Q5_0 \| 3.40 \| 15.73 \| -26.88% \|
	\| Q4_K_S \| 3.55 \| 15.12 \| -23.66% \|
	\| Q4_K_M \| 3.59 \| 14.98 \| -22.80% \|
	\| Q4_1 \| 4.01 \| 13.46 \| -13.76% \|
	\| Q4_0 \| 3.97 \| 13.50 \| -14.62% \|

	## Sample

	### anima-preview3-base

	![26-04-09-Anima_00007_](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/BSEQSYZniROqpfqiivPmB.webp)

	### anima-preview2

	![26-03-12-Anima_00031_](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/XXtcZh7Au0RO8JC1A0R6A.webp)

	### anima-preview

	![Anima_GGUF_Comparison](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/mTWE0WdRsuvoJEwRMeEWr.webp)


	## How to reproduce

	1. Convert BF16 model to FP32

	```python
	import torch
	import safetensors.torch
	import os
	import sys

	def convert_to_fp32(input_path, output_path):
	state_dict = safetensors.torch.load_file(input_path)

	new_state_dict = {}
	for key, tensor in state_dict.items():
	print(f"{key} ({tensor.dtype}) -> torch.float32")
	new_tensor = tensor.to(torch.float32)
	new_state_dict[key] = new_tensor

	safetensors.torch.save_file(new_state_dict, output_path)
	print(f"output_path: {output_path}")

	if __name__ == "__main__":
	assert len(sys.argv) == 3, f"usage: {sys.argv[0]} SOURCE TARGET"
	input_path, output_path = sys.argv[1:3]

	convert_to_fp32(input_path, output_path)
	```

	2. Read [this manual](https://github.com/city96/ComfyUI-GGUF/blob/main/tools/README.md).
	3. make F32 GGUF using https://github.com/city96/ComfyUI-GGUF/blob/main/tools/convert.py#L258
	4. Run `llama-quantize`.