Update README.md
Browse files
README.md
CHANGED
|
@@ -13,14 +13,16 @@ base_model_relation: quantized
|
|
| 13 |
|
| 14 |
There are two models - FP8 and NVFP4Mixed.
|
| 15 |
|
| 16 |
-
- FP8 : (***recommend***) maximize generation speed while preserving quality as much as possible.
|
| 17 |
-
- NVFP4Mixed : (***marginal quality***) Mixture of FP8 and NVFP4.
|
|
|
|
|
|
|
| 18 |
|
| 19 |
|
| 20 |
## Generation speed
|
| 21 |
|
| 22 |
Tested on
|
| 23 |
-
- RTX5090 (400W), ComfyUI with `--fast`option, torch2.10.0+cu130
|
| 24 |
- Generates 832x1216, 30steps, cfg 4.0, er sde, simple
|
| 25 |
|
| 26 |
| quant | none | sage+torch.compile |
|
|
@@ -30,14 +32,13 @@ Tested on
|
|
| 30 |
| nvfp4mix | 6.37s/4.71it/s (+12%)| 4.99s/6.01it/s (+43%) |
|
| 31 |
|
| 32 |
|
| 33 |
-
|
| 34 |
## Sample
|
| 35 |
|
| 36 |
| quant | sample |
|
| 37 |
|------------|----------------------|
|
| 38 |
-
| bf16 |  |
|
| 39 |
-
| fp8 |  |
|
| 40 |
-
| nvfp4mixed |  |
|
| 41 |
|
| 42 |
|
| 43 |
## Quantized layers
|
|
|
|
| 13 |
|
| 14 |
There are two models - FP8 and NVFP4Mixed.
|
| 15 |
|
| 16 |
+
- FP8 (2.4GB) : (***recommend***) maximize generation speed while preserving quality as much as possible.
|
| 17 |
+
- NVFP4Mixed (2.0GB): (***marginal quality***) Mixture of FP8 and NVFP4.
|
| 18 |
+
|
| 19 |
+
To use `torch.compile`, use the `TorchCompileModelAdvanced` node from KJNodes, set the mode to `max-autotune-no-cudagraphs`, and make sure `dynamic` is set to `false`.
|
| 20 |
|
| 21 |
|
| 22 |
## Generation speed
|
| 23 |
|
| 24 |
Tested on
|
| 25 |
+
- RTX5090 (400W), ComfyUI with `--fast` option, torch2.10.0+cu130
|
| 26 |
- Generates 832x1216, 30steps, cfg 4.0, er sde, simple
|
| 27 |
|
| 28 |
| quant | none | sage+torch.compile |
|
|
|
|
| 32 |
| nvfp4mix | 6.37s/4.71it/s (+12%)| 4.99s/6.01it/s (+43%) |
|
| 33 |
|
| 34 |
|
|
|
|
| 35 |
## Sample
|
| 36 |
|
| 37 |
| quant | sample |
|
| 38 |
|------------|----------------------|
|
| 39 |
+
| **bf16** |  |
|
| 40 |
+
| **fp8** |  |
|
| 41 |
+
| **nvfp4mixed** |  |
|
| 42 |
|
| 43 |
|
| 44 |
## Quantized layers
|