AbstractFramework
/

wan2.2-t2v-a14b-diffusers-8bit

@@ -9,6 +9,7 @@ tags:
 - mflux
 - apple-silicon
 - 8-bit
 - wan
 - wan2.2
 - video-generation
@@ -17,56 +18,47 @@ tags:
 ---
 # wan2.2-t2v-a14b-diffusers-8bit
-This repository contains MLX-Gen saved weights for `Wan-AI/Wan2.2-T2V-A14B-Diffusers`. The checkpoint is designed for local Apple Silicon inference with [`mlx-gen`](https://github.com/lpalbou/mlx-gen).
-It uses the mflux/MLX saved-weight layout. Quantized checkpoints include MLX quantization tensors. It is not a Diffusers or Transformers `from_pretrained()` checkpoint.
 ## Source Model
 Original model: [`Wan-AI/Wan2.2-T2V-A14B-Diffusers`](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers).
-## License and Access
 This quantized derivative follows the Apache 2.0 license of the source model.
 ## Quantization
-This is an MLX q8 checkpoint for Wan2.2 A14B. MLX-Gen uses 8-bit quantization for Wan modules where MLX supports quantization:
-- q8 for quantizable Wan transformer attention and feed-forward modules.
 - BF16 for the Wan VAE.
 - BF16 for Wan transformer conditioning/output projection linears, the UMT5 text encoder, scheduler metadata, tokenizer files, norms, convolutions, and other non-quantizable parameters.
-Wan q4 quality and any possible mixed q4/q8 policy are still under validation. Prefer q8 for publishable Wan checkpoints until the q4 policy is documented.
-See the [MLX-Gen quantization docs](https://github.com/lpalbou/mlx-gen/blob/main/docs/quantization.md) for compatibility notes.
-## Local Validation
-These measurements are validation-sized release checks for this uploaded package. They verify package loading, video integrity, and prompt influence for this profile only; they do not claim full-size `1280x720`, 81-frame, 40-step readiness.
-| Measurement | Value |
-|---|---:|
-| Package disk usage | 39.5 GiB |
-| Validation profile | 384x224, 33 frames, 12 steps, 8.0 fps, seed 4242, `--low-ram` |
-| Prompt pair | scientist scene / red car scene |
-| Video health | 33 / 33 frames decoded, 8.0 fps, nonblank |
-| Mean temporal delta | 5.6 / 3.2 luma |
-| Prompt delta | 102.0 mean abs RGB |
-| Generation time | 162.2 s / 319.6 s |
-## Compatibility
-Requires `mlx-gen >= 0.18.9`.
-Generated with `mlx-gen 0.18.9`.
-Use the `mlxgen` command and Python import path for new MLX-Gen projects.
 ## Usage
-The q8 A14B example below is intentionally validation-sized. Do not use this card to claim full-size `1280x720`, 81-frame, 40-step readiness until that exact path has passed video integrity and quality validation.
 ```bash
 python -m pip install -U mlx-gen
@@ -75,7 +67,7 @@ mlxgen download --model AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit
 mlxgen generate \
   --model AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit \
   --task text-to-video \
-  --prompt "Your video prompt here" \
   --width 384 \
   --height 224 \
   --frames 33 \
@@ -84,12 +76,21 @@ mlxgen generate \
   --guidance-2 3 \
   --fps 8 \
   --seed 4242 \
   --metadata \
   --output video.mp4
 ```
 ## Attribution
-MLX-Gen is based on [mflux](https://github.com/filipstrand/mflux) by Filip Strand and the original mflux contributors. This model card is generated by MLX-Gen so derived checkpoints keep that attribution visible.
 Quantized and contributed by [@lpalbou](https://huggingface.co/lpalbou).

 - mflux
 - apple-silicon
 - 8-bit
+- mixed-q8-bf16
 - wan
 - wan2.2
 - video-generation
 ---
 # wan2.2-t2v-a14b-diffusers-8bit
+This repository contains mixed q8/BF16 MLX-Gen saved weights for
+[`Wan-AI/Wan2.2-T2V-A14B-Diffusers`](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers).
+It is designed for local Apple Silicon inference with
+[`mlx-gen`](https://github.com/lpalbou/mlx-gen).
+It uses the mflux/MLX saved-weight layout with MLX quantization tensors. It is not a Diffusers or Transformers
+`from_pretrained()` checkpoint.
 ## Source Model
 Original model: [`Wan-AI/Wan2.2-T2V-A14B-Diffusers`](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers).
 This quantized derivative follows the Apache 2.0 license of the source model.
 ## Quantization
+This is a mixed q8/BF16 checkpoint:
+- q8 for quantizable Wan transformer block attention and feed-forward linears.
 - BF16 for the Wan VAE.
 - BF16 for Wan transformer conditioning/output projection linears, the UMT5 text encoder, scheduler metadata, tokenizer files, norms, convolutions, and other non-quantizable parameters.
+This mixed policy is used because fully quantizing sensitive Wan A14B paths produced invalid or low-quality video in local validation.
+## Validation
+Measured on 2026-06-04 with `mlx-gen 0.18.9` on Apple Silicon. The upstream Diffusers source snapshot measured about 118 GiB in the local Hugging Face cache before preparing these packages. The table below reports prepared-package generation from model init through MP4 save and post-save video-health validation.
+Validation profile: `384x224`, 33 frames, 12 denoising steps, guidance `4`, guidance-2 `3`, 8 fps, seed `4242`, `--low-ram`.
+| Package | Disk | Full-Process Physical Peak | Max RSS | MLX Peak | Total Time | Video Health |
+|---|---:|---:|---:|---:|---:|---|
+| BF16 package | 64.3 GiB | 33.0 GiB | 31.8 GiB | 27.7 GiB | 152.7 s | 33/33 frames, 384x224, 8 fps, temporal delta 1.3 |
+| This mixed q8/BF16 package | 39.7 GiB | 20.7 GiB | 19.5 GiB | 15.5 GiB | 154.8 s | 33/33 frames, 384x224, 8 fps, temporal delta 1.4 |
+Compared with the BF16 prepared package at the same validation profile, this mixed q8/BF16 package reduces disk usage by about 38% and full-process physical peak memory by about 37%. Total time was about 1% slower in this run.
+Physical peak is Darwin `ri_phys_footprint` sampled for the full process. The validation is intentionally small and repeatable; it is not a claim that every full-size `1280x720`, 81-frame, 40-step job has the same memory or timing profile.
 ## Usage
 ```bash
 python -m pip install -U mlx-gen
 mlxgen generate \
   --model AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit \
   --task text-to-video \
+  --prompt "A cinematic scene of a scientist working on agentic AI through the night, monitors glowing, papers shifting in a slow dolly shot." \
   --width 384 \
   --height 224 \
   --frames 33 \
   --guidance-2 3 \
   --fps 8 \
   --seed 4242 \
+  --low-ram \
   --metadata \
   --output video.mp4
 ```
+## Compatibility
+Requires `mlx-gen >= 0.18.9`.
+Generated with `mlx-gen 0.18.9`.
+Use the `mlxgen` command and Python import path for new MLX-Gen projects.
 ## Attribution
+MLX-Gen is based on [mflux](https://github.com/filipstrand/mflux) by Filip Strand and the original mflux contributors.
 Quantized and contributed by [@lpalbou](https://huggingface.co/lpalbou).