Text-to-Video
MLX
Diffusers
Safetensors
English
Chinese
apple-silicon
video-generation
image-to-video
video-continuation
longcat
flow-matching
block-sparse-attention
quantized
4-bit precision
Instructions to use mlx-community/LongCat-Video-q4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/LongCat-Video-q4 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir LongCat-Video-q4 mlx-community/LongCat-Video-q4
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
| license: mit | |
| library_name: mlx | |
| pipeline_tag: text-to-video | |
| tags: | |
| - mlx | |
| - apple-silicon | |
| - video-generation | |
| - text-to-video | |
| - image-to-video | |
| - video-continuation | |
| - longcat | |
| - flow-matching | |
| - block-sparse-attention | |
| - quantized | |
| - 4-bit | |
| base_model: | |
| - mlx-community/LongCat-Video-bf16 | |
| language: | |
| - en | |
| - zh | |
| Part of the [LongCat-Video β MLX](https://huggingface.co/collections/mlx-community/longcat-video-mlx-6a216a3576c098e83c1cc167) collection. | |
| # LongCat-Video-q4 (MLX) | |
| 4-bit quantized variant of [mlx-community/LongCat-Video-bf16](https://huggingface.co/mlx-community/LongCat-Video-bf16). | |
| Same model, same six task variants (T2V / I2V / Continuation / Refinement / Long-Video / Interactive), | |
| same `cfg_step_lora` + `refinement_lora` files β just with the DiT Linears | |
| quantized to 4-bit via `mlx.nn.quantize` for smaller-RAM Macs. | |
| ## TL;DR | |
| | | | | |
| |---|---| | |
| | **DiT** | 4-bit quantized (`group_size=64`, skip `final_layer.linear` + embedders + AdaLN) | | |
| | **DiT size** | ~9 GB (2 shards; 2.85Γ smaller than bf16's 26 GB) | | |
| | **VAE / umT5 / LoRAs** | bf16 (unchanged from bf16-variant) | | |
| | **Total disk** | ~25 GB (vs 42 GB bf16) | | |
| | **Min unified memory** | ~32 GB recommended for 480p | | |
| | **Inference** | 50-step baseline OR 8-step with `cfg_step_lora` (fast) | | |
| | **License** | MIT | | |
| ## Quantization details | |
| - **Method:** `mlx.nn.quantize(bits=4, group_size=64)` β MLX-LM convention | |
| - **What's quantized:** every `nn.Linear` in the 48-block DiT EXCEPT the | |
| skip patterns below | |
| - **Skip patterns** (kept at bf16): | |
| - `final_layer.linear` β Meituan's documented skip | |
| - `t_embedder.` β TimestepEmbedder MLP (small + sensitive; feeds | |
| `adaLN_modulation` which would otherwise corrupt β see L42 in | |
| [skill-lessons.md](https://github.com/xocialize/longcat-video-mlx/blob/main/docs/development/skill-lessons.md)) | |
| - `y_embedder.` β CaptionEmbedder MLP (small + sensitive) | |
| - `adaLN_modulation.` β per-block AdaLN-Zero modulation (**must stay | |
| floating-point** β silent accumulation bug if quantized, L11) | |
| - **What's NOT quantized:** VAE, umT5, both LoRAs β they're small | |
| contributors to total disk and quantizing them would degrade output | |
| more than save space. | |
| The runtime pipeline (`longcat_video` package) auto-detects the | |
| `quantization` block in `dit/config.json` and applies `nn.quantize` | |
| *before* `load_weights`. No user-facing API change vs. the bf16 variant. | |
| ## Quick start | |
| ```bash | |
| # 1. Pull weights (~25 GB) | |
| hf download mlx-community/LongCat-Video-q4 --local-dir ./weights | |
| # 2. Set up inference (Python 3.12) | |
| git clone https://github.com/xocialize/longcat-video-mlx | |
| cd longcat-video-mlx | |
| python3.12 -m venv .venv | |
| .venv/bin/pip install -e ".[parity]" | |
| # 3. Run text-to-video β pass --variant q4 | |
| .venv/bin/python scripts/run_t2v.py \ | |
| --weights ./weights/.. \ | |
| --variant q4 \ | |
| --prompt "A cat surfing on a wave at sunset, cinematic, 8k" \ | |
| --num-frames 93 \ | |
| --out output_t2v.mp4 | |
| # 4. Fast mode: --variant q4 --cfg-step-lora reduces 50 steps β 8 | |
| .venv/bin/python scripts/run_t2v.py \ | |
| --weights ./weights/.. \ | |
| --variant q4 --cfg-step-lora \ | |
| --prompt "A cat surfing on a wave at sunset..." \ | |
| --num-frames 93 \ | |
| --out output_t2v_fast.mp4 | |
| ``` | |
| ## Choosing between bf16, q4, q8 | |
| | Variant | Disk | Min RAM | Quality | Pick when | | |
| |---|---|---|---|---| | |
| | **bf16** | 42 GB | 64 GB | reference | You want the best output and have the RAM headroom | | |
| | **q4** | 25 GB | 32 GB | minor degradation | RAM is tight; you'd rather have q4 than not run at all | | |
| | **q8** | 30 GB | 48 GB | very close to bf16 | Best of both β small disk savings, near-bf16 quality | | |
| For batch generation / API serving, **bf16 is the right choice** β | |
| quality regression compounds. For exploration / personal use on a | |
| 32β64 GB Mac, **q4 is the sweet spot**. | |
| ## License | |
| MIT β matches the upstream | |
| [LongCat-Video](https://github.com/meituan-longcat/LongCat-Video) license. | |