Text-to-Image
Ronysoc commited on
Commit
ac7a7c4
·
verified ·
1 Parent(s): a2dd0de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -1
README.md CHANGED
@@ -17,4 +17,37 @@ It is optimized to significantly reduce VRAM usage while maintaining high-qualit
17
  ## Quantization Tool
18
 
19
  This model was quantized using the following open-source tool:
20
- * **Quantizer**: [comfy-dit-quantizer](https://github.com/bedovyy/comfy-dit-quantizer)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ## Quantization Tool
18
 
19
  This model was quantized using the following open-source tool:
20
+ * **Quantizer**: [comfy-dit-quantizer](https://github.com/bedovyy/comfy-dit-quantizer)
21
+
22
+ There are two models - FP8 and FP8-balanced
23
+
24
+ - FP8 (2.4GB) : (***recommend***) maximize generation speed while preserving quality as much as possible.
25
+ - FP8-balanced : (***Personal Preference***) retain the prefix and suffix blocks intact, while exclusively modifying the Self-Attention and MLP layers. As a result, its performance is remarkably close to the original BF16 model.
26
+
27
+ ## Quantized layers
28
+
29
+ ### fp8
30
+ ```json
31
+ {
32
+ "format": "comfy_quant",
33
+ "block_names": ["net.blocks."],
34
+ "rules": [
35
+ { "policy": "keep", "match": ["blocks.0", "blocks.1."] },
36
+ { "policy": "float8_e4m3fn", "match": ["q_proj", "k_proj", "v_proj", "o_proj", "output_proj", ".mlp"] },
37
+ { "policy": "nvfp4", "match": [] }
38
+ ]
39
+ }
40
+ ```
41
+
42
+ ### fp8-balanced
43
+ ```json
44
+ {
45
+ "format": "comfy_quant",
46
+ "block_names": ["net.blocks."],
47
+ "rules": [
48
+ { "policy": "keep", "match": ["blocks.0.", "blocks.1.", "blocks.26.", "blocks.27."] },
49
+ { "policy": "float8_e4m3fn", "match": ["self_attn.", ".mlp"] },
50
+ { "policy": "nvfp4", "match": [] }
51
+ ]
52
+ }
53
+ ```