File size: 3,625 Bytes

25db82c
 
 
 
 
 
 
 
 
 
 
39ee5f1
25db82c
 
 
 
 
 
 
 
39ee5f1
 
 
 
25db82c
39ee5f1
 
25db82c
 
 
 
 
 
 
 
 
39ee5f1
25db82c
39ee5f1
25db82c
161bb7a
25db82c
39ee5f1
25db82c
39ee5f1
25db82c
39ee5f1
25db82c
39ee5f1
25db82c
39ee5f1
 
 
 
25db82c
39ee5f1
25db82c
39ee5f1
25db82c
 
 
 
 
 
 
 
 
 
 
39ee5f1
161bb7a
 
 
 
25db82c
 
161bb7a
 
39ee5f1
161bb7a
25db82c
 
 
39ee5f1
 
 
 
 
 
 
 
25db82c
 
39ee5f1
25db82c

---
license: apache-2.0
base_model: Wan-AI/Wan2.2-T2V-A14B-Diffusers
pipeline_tag: text-to-video
library_name: mlx-gen
tags:
- mlx
- mlx-gen
- mflux
- apple-silicon
- 8-bit
- mixed-q8-bf16
- wan
- wan2.2
- video-generation
- text-to-video
- wan-a14b
---
# wan2.2-t2v-a14b-diffusers-8bit

This repository contains mixed q8/BF16 MLX-Gen saved weights for
[`Wan-AI/Wan2.2-T2V-A14B-Diffusers`](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers).
It is designed for local Apple Silicon inference with
[`mlx-gen`](https://github.com/lpalbou/mlx-gen).

It uses the mflux/MLX saved-weight layout with MLX quantization tensors. It is not a Diffusers or Transformers
`from_pretrained()` checkpoint.

## Source Model

Original model: [`Wan-AI/Wan2.2-T2V-A14B-Diffusers`](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers).

This quantized derivative follows the Apache 2.0 license of the source model.

## Quantization

This is a mixed q8/BF16 checkpoint:

- q8 for quantizable Wan transformer block attention and feed-forward linears.
- BF16 for the Wan VAE.
- BF16 for Wan transformer conditioning/output projection linears, the UMT5 text encoder, scheduler metadata, tokenizer files, norms, convolutions, and other non-quantizable parameters.

This mixed policy is used because fully quantizing sensitive Wan A14B paths produced invalid or low-quality video in local validation.

## Validation

Measured on 2026-06-04 with `mlx-gen 0.18.9` on Apple Silicon. The upstream Diffusers source snapshot measured about 118 GiB in the local Hugging Face cache before preparing these packages. The table below reports prepared-package generation from model init through MP4 save and post-save video-health validation.

Validation profile: `384x224`, 33 frames, 12 denoising steps, guidance `4`, guidance-2 `3`, 8 fps, seed `4242`, `--low-ram`.

| Package | Disk | Full-Process Physical Peak | Max RSS | MLX Peak | Total Time | Video Health |
|---|---:|---:|---:|---:|---:|---|
| BF16 package | 64.3 GiB | 33.0 GiB | 31.8 GiB | 27.7 GiB | 152.7 s | 33/33 frames, 384x224, 8 fps, temporal delta 1.3 |
| This mixed q8/BF16 package | 39.7 GiB | 20.7 GiB | 19.5 GiB | 15.5 GiB | 154.8 s | 33/33 frames, 384x224, 8 fps, temporal delta 1.4 |

Compared with the BF16 prepared package at the same validation profile, this mixed q8/BF16 package reduces disk usage by about 38% and full-process physical peak memory by about 37%. Total time was about 1% slower in this run.

Physical peak is Darwin `ri_phys_footprint` sampled for the full process. The validation is intentionally small and repeatable; it is not a claim that every full-size `1280x720`, 81-frame, 40-step job has the same memory or timing profile.

## Usage

```bash
python -m pip install -U mlx-gen

mlxgen download --model AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit

mlxgen generate \
  --model AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit \
  --task text-to-video \
  --prompt "A cinematic scene of a scientist working on agentic AI through the night, monitors glowing, papers shifting in a slow dolly shot." \
  --width 384 \
  --height 224 \
  --frames 33 \
  --steps 12 \
  --guidance 4 \
  --guidance-2 3 \
  --fps 8 \
  --seed 4242 \
  --low-ram \
  --metadata \
  --output video.mp4
```

## Compatibility

Requires `mlx-gen >= 0.18.9`.

Generated with `mlx-gen 0.18.9`.

Use the `mlxgen` command and Python import path for new MLX-Gen projects.

## Attribution

MLX-Gen is based on [mflux](https://github.com/filipstrand/mflux) by Filip Strand and the original mflux contributors.

Quantized and contributed by [@lpalbou](https://huggingface.co/lpalbou).