drbaph commited on
Commit
c777f58
·
verified ·
1 Parent(s): 90b6769

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -52,7 +52,7 @@ FP8 quantized versions of the [LTX-2.3 22B](https://huggingface.co/Lightricks/LT
52
  - **Method:** Static per-tensor W8A8 quantization
53
  - **Scope:** Transformer blocks 1–42 (block 0 and last 5 blocks kept in BF16)
54
  - **Targets:** All linear projection weight matrices in `attn1`, `attn2`, `audio_attn1`, `audio_attn2`, `audio_to_video_attn`, `video_to_audio_attn`, `ff.net`, `audio_ff.net` — specifically `to_q`, `to_k`, `to_v`, `to_out.0`, `ff.net.0.proj`, `ff.net.2` and their audio equivalents
55
- - **Scale:** Per-tensor `input_scale = max(|W|) / 448` stored as F32 scalar (the reconstruction scale: `real_W = fp8_weight × input_scale`). Static `weight_scale = 1.0` matches Lightricks' own fp8 convention exactly
56
  - **Non-quantized:** Biases, norms, scale_shift_tables, gate_logits kept as BF16/F32
57
  - **Quantized tensors:** 1176 / 5947 total (28 patterns × 42 blocks)
58
  - **Output size:** ~29.94 GB (down from ~46 GB BF16)
 
52
  - **Method:** Static per-tensor W8A8 quantization
53
  - **Scope:** Transformer blocks 1–42 (block 0 and last 5 blocks kept in BF16)
54
  - **Targets:** All linear projection weight matrices in `attn1`, `attn2`, `audio_attn1`, `audio_attn2`, `audio_to_video_attn`, `video_to_audio_attn`, `ff.net`, `audio_ff.net` — specifically `to_q`, `to_k`, `to_v`, `to_out.0`, `ff.net.0.proj`, `ff.net.2` and their audio equivalents
55
+ - **Scale:** Per-tensor `weight_scale = max(|W|) / 448` stored as F32 scalar alongside each weight. Static `input_scale = 1.0` placeholder matching the source model format
56
  - **Non-quantized:** Biases, norms, scale_shift_tables, gate_logits kept as BF16/F32
57
  - **Quantized tensors:** 1176 / 5947 total (28 patterns × 42 blocks)
58
  - **Output size:** ~29.94 GB (down from ~46 GB BF16)