Update README.md
Browse files
README.md
CHANGED
|
@@ -17,4 +17,37 @@ It is optimized to significantly reduce VRAM usage while maintaining high-qualit
|
|
| 17 |
## Quantization Tool
|
| 18 |
|
| 19 |
This model was quantized using the following open-source tool:
|
| 20 |
-
* **Quantizer**: [comfy-dit-quantizer](https://github.com/bedovyy/comfy-dit-quantizer)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
## Quantization Tool
|
| 18 |
|
| 19 |
This model was quantized using the following open-source tool:
|
| 20 |
+
* **Quantizer**: [comfy-dit-quantizer](https://github.com/bedovyy/comfy-dit-quantizer)
|
| 21 |
+
|
| 22 |
+
There are two models - FP8 and FP8-balanced
|
| 23 |
+
|
| 24 |
+
- FP8 (2.4GB) : (***recommend***) maximize generation speed while preserving quality as much as possible.
|
| 25 |
+
- FP8-balanced : (***Personal Preference***) retain the prefix and suffix blocks intact, while exclusively modifying the Self-Attention and MLP layers. As a result, its performance is remarkably close to the original BF16 model.
|
| 26 |
+
|
| 27 |
+
## Quantized layers
|
| 28 |
+
|
| 29 |
+
### fp8
|
| 30 |
+
```json
|
| 31 |
+
{
|
| 32 |
+
"format": "comfy_quant",
|
| 33 |
+
"block_names": ["net.blocks."],
|
| 34 |
+
"rules": [
|
| 35 |
+
{ "policy": "keep", "match": ["blocks.0", "blocks.1."] },
|
| 36 |
+
{ "policy": "float8_e4m3fn", "match": ["q_proj", "k_proj", "v_proj", "o_proj", "output_proj", ".mlp"] },
|
| 37 |
+
{ "policy": "nvfp4", "match": [] }
|
| 38 |
+
]
|
| 39 |
+
}
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
### fp8-balanced
|
| 43 |
+
```json
|
| 44 |
+
{
|
| 45 |
+
"format": "comfy_quant",
|
| 46 |
+
"block_names": ["net.blocks."],
|
| 47 |
+
"rules": [
|
| 48 |
+
{ "policy": "keep", "match": ["blocks.0.", "blocks.1.", "blocks.26.", "blocks.27."] },
|
| 49 |
+
{ "policy": "float8_e4m3fn", "match": ["self_attn.", ".mlp"] },
|
| 50 |
+
{ "policy": "nvfp4", "match": [] }
|
| 51 |
+
]
|
| 52 |
+
}
|
| 53 |
+
```
|