LongCat-Video-q8 / README.md
xocialize's picture
Upload folder using huggingface_hub
b167c8c verified
metadata
license: mit
library_name: mlx
pipeline_tag: text-to-video
tags:
  - mlx
  - apple-silicon
  - video-generation
  - text-to-video
  - image-to-video
  - video-continuation
  - longcat
  - flow-matching
  - block-sparse-attention
  - quantized
  - 8-bit
base_model:
  - mlx-community/LongCat-Video-bf16
language:
  - en
  - zh

Part of the LongCat-Video — MLX collection.

LongCat-Video-q8 (MLX)

8-bit quantized variant of mlx-community/LongCat-Video-bf16. Same model, same six task variants (T2V / I2V / Continuation / Refinement / Long-Video / Interactive), same cfg_step_lora + refinement_lora files — just with the DiT Linears quantized to 8-bit via mlx.nn.quantize.

The 8-bit variant trades a small disk-savings improvement (vs 4-bit) for near-bf16 quality. If you have the RAM headroom for 30 GB but not 42 GB, q8 is the right pick.

TL;DR

DiT 8-bit quantized (group_size=64, skip final_layer.linear + embedders + AdaLN)
DiT size ~15 GB (4 shards; 1.7× smaller than bf16's 26 GB)
VAE / umT5 / LoRAs bf16 (unchanged from bf16-variant)
Total disk ~31 GB (vs 42 GB bf16)
Min unified memory ~48 GB recommended for 480p
Inference 50-step baseline OR 8-step with cfg_step_lora (fast)
License MIT

Quantization details

Same skip pattern as q4 — see the q4 card for full notes on why each pattern is excluded (L11 + L42 in the skill-lessons).

The only difference vs q4 is bits=8 in the quantization config block.

Quick start

# 1. Pull weights (~31 GB)
hf download mlx-community/LongCat-Video-q8 --local-dir ./weights

# 2. Set up inference
git clone https://github.com/xocialize/longcat-video-mlx
cd longcat-video-mlx
python3.12 -m venv .venv
.venv/bin/pip install -e ".[parity]"

# 3. Run text-to-video — pass --variant q8
.venv/bin/python scripts/run_t2v.py \
    --weights ./weights/.. \
    --variant q8 \
    --prompt "A cat surfing on a wave at sunset, cinematic, 8k" \
    --num-frames 93 \
    --out output_t2v.mp4

Choosing between bf16, q4, q8

Variant Disk Min RAM Quality Pick when
bf16 42 GB 64 GB reference Best output, you have the RAM headroom
q4 25 GB 32 GB minor degradation RAM is tight (32 GB Mac)
q8 30 GB 48 GB very close to bf16 Best balance — small savings, near-bf16 quality

License

MIT — matches the upstream LongCat-Video license.