Update README.md
Browse files
README.md
CHANGED
|
@@ -52,7 +52,7 @@ FP8 quantized versions of the [LTX-2.3 22B](https://huggingface.co/Lightricks/LT
|
|
| 52 |
- **Method:** Static per-tensor W8A8 quantization
|
| 53 |
- **Scope:** Transformer blocks 1–42 (block 0 and last 5 blocks kept in BF16)
|
| 54 |
- **Targets:** All linear projection weight matrices in `attn1`, `attn2`, `audio_attn1`, `audio_attn2`, `audio_to_video_attn`, `video_to_audio_attn`, `ff.net`, `audio_ff.net` — specifically `to_q`, `to_k`, `to_v`, `to_out.0`, `ff.net.0.proj`, `ff.net.2` and their audio equivalents
|
| 55 |
-
- **Scale:** Per-tensor `
|
| 56 |
- **Non-quantized:** Biases, norms, scale_shift_tables, gate_logits kept as BF16/F32
|
| 57 |
- **Quantized tensors:** 1176 / 5947 total (28 patterns × 42 blocks)
|
| 58 |
- **Output size:** ~29.94 GB (down from ~46 GB BF16)
|
|
|
|
| 52 |
- **Method:** Static per-tensor W8A8 quantization
|
| 53 |
- **Scope:** Transformer blocks 1–42 (block 0 and last 5 blocks kept in BF16)
|
| 54 |
- **Targets:** All linear projection weight matrices in `attn1`, `attn2`, `audio_attn1`, `audio_attn2`, `audio_to_video_attn`, `video_to_audio_attn`, `ff.net`, `audio_ff.net` — specifically `to_q`, `to_k`, `to_v`, `to_out.0`, `ff.net.0.proj`, `ff.net.2` and their audio equivalents
|
| 55 |
+
- **Scale:** Per-tensor `weight_scale = max(|W|) / 448` stored as F32 scalar alongside each weight. Static `input_scale = 1.0` placeholder matching the source model format
|
| 56 |
- **Non-quantized:** Biases, norms, scale_shift_tables, gate_logits kept as BF16/F32
|
| 57 |
- **Quantized tensors:** 1176 / 5947 total (28 patterns × 42 blocks)
|
| 58 |
- **Output size:** ~29.94 GB (down from ~46 GB BF16)
|