---
license: mit
library_name: mlx
pipeline_tag: text-to-video
tags:
  - mlx
  - apple-silicon
  - video-generation
  - text-to-video
  - image-to-video
  - video-continuation
  - longcat
  - flow-matching
  - block-sparse-attention
  - quantized
  - 8-bit
base_model:
  - mlx-community/LongCat-Video-bf16
language:
  - en
  - zh
---

Part of the [LongCat-Video — MLX](https://huggingface.co/collections/mlx-community/longcat-video-mlx-6a216a3576c098e83c1cc167) collection.


# LongCat-Video-q8 (MLX)

8-bit quantized variant of [mlx-community/LongCat-Video-bf16](https://huggingface.co/mlx-community/LongCat-Video-bf16).
Same model, same six task variants (T2V / I2V / Continuation / Refinement / Long-Video / Interactive),
same `cfg_step_lora` + `refinement_lora` files — just with the DiT Linears
quantized to 8-bit via `mlx.nn.quantize`.

The 8-bit variant trades a small disk-savings improvement (vs 4-bit) for
**near-bf16 quality**. If you have the RAM headroom for 30 GB but not 42 GB,
q8 is the right pick.

## TL;DR

| | |
|---|---|
| **DiT** | 8-bit quantized (`group_size=64`, skip `final_layer.linear` + embedders + AdaLN) |
| **DiT size** | ~15 GB (4 shards; 1.7× smaller than bf16's 26 GB) |
| **VAE / umT5 / LoRAs** | bf16 (unchanged from bf16-variant) |
| **Total disk** | ~31 GB (vs 42 GB bf16) |
| **Min unified memory** | ~48 GB recommended for 480p |
| **Inference** | 50-step baseline OR 8-step with `cfg_step_lora` (fast) |
| **License** | MIT |

## Quantization details

Same skip pattern as q4 — see the q4 card for full notes on why each
pattern is excluded (L11 + L42 in the
[skill-lessons](https://github.com/xocialize/longcat-video-mlx/blob/main/docs/development/skill-lessons.md)).

The only difference vs q4 is `bits=8` in the `quantization` config block.

## Quick start

```bash
# 1. Pull weights (~31 GB)
hf download mlx-community/LongCat-Video-q8 --local-dir ./weights

# 2. Set up inference
git clone https://github.com/xocialize/longcat-video-mlx
cd longcat-video-mlx
python3.12 -m venv .venv
.venv/bin/pip install -e ".[parity]"

# 3. Run text-to-video — pass --variant q8
.venv/bin/python scripts/run_t2v.py \
    --weights ./weights/.. \
    --variant q8 \
    --prompt "A cat surfing on a wave at sunset, cinematic, 8k" \
    --num-frames 93 \
    --out output_t2v.mp4
```

## Choosing between bf16, q4, q8

| Variant | Disk | Min RAM | Quality | Pick when |
|---|---|---|---|---|
| **bf16** | 42 GB | 64 GB | reference | Best output, you have the RAM headroom |
| **q4** | 25 GB | 32 GB | minor degradation | RAM is tight (32 GB Mac) |
| **q8** | 30 GB | 48 GB | very close to bf16 | Best balance — small savings, near-bf16 quality |

## License

MIT — matches the upstream
[LongCat-Video](https://github.com/meituan-longcat/LongCat-Video) license.