drbaph commited on
Commit
5366915
·
verified ·
1 Parent(s): e976563

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -1
README.md CHANGED
@@ -34,4 +34,36 @@ tags:
34
  - lightricks
35
  pinned: true
36
  demo: https://app.ltx.studio/ltx-2-playground/i2v
37
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  - lightricks
35
  pinned: true
36
  demo: https://app.ltx.studio/ltx-2-playground/i2v
37
+ ---
38
+
39
+ # LTX-2.3 FP8 Quantized
40
+ FP8 quantized versions of the [LTX-2.3 22B](https://huggingface.co/Lightricks/LTX-2.3) models by Lightricks.
41
+
42
+ ![LTX-2 Open Source](ltx2.3-open.png)
43
+
44
+ ## Quantized Checkpoints
45
+ | Name | Original | Size |
46
+ |------|----------|------|
47
+ | ltx-2.3-22b-dev-fp8.safetensors | ltx-2.3-22b-dev | ~30 GB |
48
+ | ltx-2.3-22b-distilled-fp8.safetensors | ltx-2.3-22b-distilled | ~30 GB |
49
+ ## Quantization Details
50
+
51
+ - **Format:** `float8_e4m3fn` (E4M3, max=448)
52
+ - **Method:** Static per-tensor W8A8 quantization
53
+ - **Scope:** Transformer blocks 1–42 (block 0 and last 5 blocks kept in BF16)
54
+ - **Targets:** All linear projection weight matrices in `attn1`, `attn2`, `audio_attn1`, `audio_attn2`, `audio_to_video_attn`, `video_to_audio_attn`, `ff.net`, `audio_ff.net` — specifically `to_q`, `to_k`, `to_v`, `to_out.0`, `ff.net.0.proj`, `ff.net.2` and their audio equivalents
55
+ - **Scale:** Per-tensor `weight_scale = max(|W|) / 448` stored as F32 scalar alongside each weight. Static `input_scale = 1.0` placeholder matching the source model format
56
+ - **Non-quantized:** Biases, norms, scale_shift_tables, gate_logits kept as BF16/F32
57
+ - **Quantized tensors:** 1176 / 5947 total (28 patterns × 42 blocks)
58
+ - **Output size:** ~29.94 GB (down from ~46 GB BF16)
59
+ ## Original Model
60
+ This is a quantized derivative of [Lightricks/LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3). All original model details, usage instructions, and license terms apply.
61
+ > LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model.
62
+ ## Citation
63
+ ```bibtex
64
+ @article{hacohen2025ltx2,
65
+ title={LTX-2: Efficient Joint Audio-Visual Foundation Model},
66
+ author={HaCohen, Yoav and Brazowski, Benny and Chiprut, Nisan and Bitterman, Yaki and Kvochko, Andrew and Berkowitz, Avishai and Shalem, Daniel and Lifschitz, Daphna and Moshe, Dudu and Porat, Eitan and Richardson, Eitan and Guy Shiran and Itay Chachy and Jonathan Chetboun and Michael Finkelson and Michael Kupchick and Nir Zabari and Nitzan Guetta and Noa Kotler and Ofir Bibi and Ori Gordon and Poriya Panet and Roi Benita and Shahar Armon and Victor Kulikov and Yaron Inger and Yonatan Shiftan and Zeev Melumian and Zeev Farbman},
67
+ journal={arXiv preprint arXiv:2601.03233},
68
+ year={2025}
69
+ }