--- license: mit library_name: mlx pipeline_tag: text-to-video tags: - mlx - apple-silicon - video-generation - text-to-video - image-to-video - video-continuation - longcat - flow-matching - block-sparse-attention - quantized - 8-bit base_model: - mlx-community/LongCat-Video-bf16 language: - en - zh --- Part of the [LongCat-Video — MLX](https://huggingface.co/collections/mlx-community/longcat-video-mlx-6a216a3576c098e83c1cc167) collection. # LongCat-Video-q8 (MLX) 8-bit quantized variant of [mlx-community/LongCat-Video-bf16](https://huggingface.co/mlx-community/LongCat-Video-bf16). Same model, same six task variants (T2V / I2V / Continuation / Refinement / Long-Video / Interactive), same `cfg_step_lora` + `refinement_lora` files — just with the DiT Linears quantized to 8-bit via `mlx.nn.quantize`. The 8-bit variant trades a small disk-savings improvement (vs 4-bit) for **near-bf16 quality**. If you have the RAM headroom for 30 GB but not 42 GB, q8 is the right pick. ## TL;DR | | | |---|---| | **DiT** | 8-bit quantized (`group_size=64`, skip `final_layer.linear` + embedders + AdaLN) | | **DiT size** | ~15 GB (4 shards; 1.7× smaller than bf16's 26 GB) | | **VAE / umT5 / LoRAs** | bf16 (unchanged from bf16-variant) | | **Total disk** | ~31 GB (vs 42 GB bf16) | | **Min unified memory** | ~48 GB recommended for 480p | | **Inference** | 50-step baseline OR 8-step with `cfg_step_lora` (fast) | | **License** | MIT | ## Quantization details Same skip pattern as q4 — see the q4 card for full notes on why each pattern is excluded (L11 + L42 in the [skill-lessons](https://github.com/xocialize/longcat-video-mlx/blob/main/docs/development/skill-lessons.md)). The only difference vs q4 is `bits=8` in the `quantization` config block. ## Quick start ```bash # 1. Pull weights (~31 GB) hf download mlx-community/LongCat-Video-q8 --local-dir ./weights # 2. Set up inference git clone https://github.com/xocialize/longcat-video-mlx cd longcat-video-mlx python3.12 -m venv .venv .venv/bin/pip install -e ".[parity]" # 3. Run text-to-video — pass --variant q8 .venv/bin/python scripts/run_t2v.py \ --weights ./weights/.. \ --variant q8 \ --prompt "A cat surfing on a wave at sunset, cinematic, 8k" \ --num-frames 93 \ --out output_t2v.mp4 ``` ## Choosing between bf16, q4, q8 | Variant | Disk | Min RAM | Quality | Pick when | |---|---|---|---|---| | **bf16** | 42 GB | 64 GB | reference | Best output, you have the RAM headroom | | **q4** | 25 GB | 32 GB | minor degradation | RAM is tight (32 GB Mac) | | **q8** | 30 GB | 48 GB | very close to bf16 | Best balance — small savings, near-bf16 quality | ## License MIT — matches the upstream [LongCat-Video](https://github.com/meituan-longcat/LongCat-Video) license.