Instructions to use mlx-community/LongCat-Video-q8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/LongCat-Video-q8 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir LongCat-Video-q8 mlx-community/LongCat-Video-q8
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
license: mit
library_name: mlx
pipeline_tag: text-to-video
tags:
- mlx
- apple-silicon
- video-generation
- text-to-video
- image-to-video
- video-continuation
- longcat
- flow-matching
- block-sparse-attention
- quantized
- 8-bit
base_model:
- mlx-community/LongCat-Video-bf16
language:
- en
- zh
Part of the LongCat-Video — MLX collection.
LongCat-Video-q8 (MLX)
8-bit quantized variant of mlx-community/LongCat-Video-bf16.
Same model, same six task variants (T2V / I2V / Continuation / Refinement / Long-Video / Interactive),
same cfg_step_lora + refinement_lora files — just with the DiT Linears
quantized to 8-bit via mlx.nn.quantize.
The 8-bit variant trades a small disk-savings improvement (vs 4-bit) for near-bf16 quality. If you have the RAM headroom for 30 GB but not 42 GB, q8 is the right pick.
TL;DR
| DiT | 8-bit quantized (group_size=64, skip final_layer.linear + embedders + AdaLN) |
| DiT size | ~15 GB (4 shards; 1.7× smaller than bf16's 26 GB) |
| VAE / umT5 / LoRAs | bf16 (unchanged from bf16-variant) |
| Total disk | ~31 GB (vs 42 GB bf16) |
| Min unified memory | ~48 GB recommended for 480p |
| Inference | 50-step baseline OR 8-step with cfg_step_lora (fast) |
| License | MIT |
Quantization details
Same skip pattern as q4 — see the q4 card for full notes on why each pattern is excluded (L11 + L42 in the skill-lessons).
The only difference vs q4 is bits=8 in the quantization config block.
Quick start
# 1. Pull weights (~31 GB)
hf download mlx-community/LongCat-Video-q8 --local-dir ./weights
# 2. Set up inference
git clone https://github.com/xocialize/longcat-video-mlx
cd longcat-video-mlx
python3.12 -m venv .venv
.venv/bin/pip install -e ".[parity]"
# 3. Run text-to-video — pass --variant q8
.venv/bin/python scripts/run_t2v.py \
--weights ./weights/.. \
--variant q8 \
--prompt "A cat surfing on a wave at sunset, cinematic, 8k" \
--num-frames 93 \
--out output_t2v.mp4
Choosing between bf16, q4, q8
| Variant | Disk | Min RAM | Quality | Pick when |
|---|---|---|---|---|
| bf16 | 42 GB | 64 GB | reference | Best output, you have the RAM headroom |
| q4 | 25 GB | 32 GB | minor degradation | RAM is tight (32 GB Mac) |
| q8 | 30 GB | 48 GB | very close to bf16 | Best balance — small savings, near-bf16 quality |
License
MIT — matches the upstream LongCat-Video license.