Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,23 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
base_model:
|
| 4 |
+
- THUDM/CogView4-6B
|
| 5 |
+
base_model_relation: quantized
|
| 6 |
+
tags:
|
| 7 |
+
- quanto
|
| 8 |
+
---
|
| 9 |
+
## Quantization settings
|
| 10 |
+
|
| 11 |
+
- `vae.`: `torch.bfloat16`. No quantization.
|
| 12 |
+
- `text_encoder.layers.`:
|
| 13 |
+
- Int8 with [Optimum Quanto](https://github.com/huggingface/optimum-quanto)
|
| 14 |
+
- Target layers:`["q_proj", "k_proj", "v_proj", "o_proj", "mlp.down_proj", "mlp.gate_up_proj"]`
|
| 15 |
+
- `diffusion_model.`:
|
| 16 |
+
- Int8 with [Optimum Quanto](https://github.com/huggingface/optimum-quanto)
|
| 17 |
+
- Target layers: `["to_q", "to_k", "to_v", "to_out.0", "ff.net.0.proj", "ff.net.2"]`
|
| 18 |
+
|
| 19 |
+
## VRAM cosumption
|
| 20 |
+
|
| 21 |
+
- Text encoder (`text_encoder.`): about 11 GB
|
| 22 |
+
- Denoiser (`diffusion_model.`): about 10 GB
|
| 23 |
+
- VAE (`vae.`): about 1.5 GB
|