lpalbou commited on
Commit
39ee5f1
·
verified ·
1 Parent(s): 161bb7a

Add files using upload-large-folder tool

Browse files
Files changed (1) hide show
  1. README.md +30 -29
README.md CHANGED
@@ -9,6 +9,7 @@ tags:
9
  - mflux
10
  - apple-silicon
11
  - 8-bit
 
12
  - wan
13
  - wan2.2
14
  - video-generation
@@ -17,56 +18,47 @@ tags:
17
  ---
18
  # wan2.2-t2v-a14b-diffusers-8bit
19
 
20
- This repository contains MLX-Gen saved weights for `Wan-AI/Wan2.2-T2V-A14B-Diffusers`. The checkpoint is designed for local Apple Silicon inference with [`mlx-gen`](https://github.com/lpalbou/mlx-gen).
 
 
 
21
 
22
- It uses the mflux/MLX saved-weight layout. Quantized checkpoints include MLX quantization tensors. It is not a Diffusers or Transformers `from_pretrained()` checkpoint.
 
23
 
24
  ## Source Model
25
 
26
  Original model: [`Wan-AI/Wan2.2-T2V-A14B-Diffusers`](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers).
27
 
28
- ## License and Access
29
-
30
  This quantized derivative follows the Apache 2.0 license of the source model.
31
 
32
  ## Quantization
33
 
34
- This is an MLX q8 checkpoint for Wan2.2 A14B. MLX-Gen uses 8-bit quantization for Wan modules where MLX supports quantization:
35
 
36
- - q8 for quantizable Wan transformer attention and feed-forward modules.
37
  - BF16 for the Wan VAE.
38
  - BF16 for Wan transformer conditioning/output projection linears, the UMT5 text encoder, scheduler metadata, tokenizer files, norms, convolutions, and other non-quantizable parameters.
39
 
40
- Wan q4 quality and any possible mixed q4/q8 policy are still under validation. Prefer q8 for publishable Wan checkpoints until the q4 policy is documented.
41
-
42
- See the [MLX-Gen quantization docs](https://github.com/lpalbou/mlx-gen/blob/main/docs/quantization.md) for compatibility notes.
43
 
44
- ## Local Validation
45
 
46
- These measurements are validation-sized release checks for this uploaded package. They verify package loading, video integrity, and prompt influence for this profile only; they do not claim full-size `1280x720`, 81-frame, 40-step readiness.
47
 
48
- | Measurement | Value |
49
- |---|---:|
50
- | Package disk usage | 39.5 GiB |
51
- | Validation profile | 384x224, 33 frames, 12 steps, 8.0 fps, seed 4242, `--low-ram` |
52
- | Prompt pair | scientist scene / red car scene |
53
- | Video health | 33 / 33 frames decoded, 8.0 fps, nonblank |
54
- | Mean temporal delta | 5.6 / 3.2 luma |
55
- | Prompt delta | 102.0 mean abs RGB |
56
- | Generation time | 162.2 s / 319.6 s |
57
-
58
- ## Compatibility
59
 
60
- Requires `mlx-gen >= 0.18.9`.
 
 
 
61
 
62
- Generated with `mlx-gen 0.18.9`.
63
 
64
- Use the `mlxgen` command and Python import path for new MLX-Gen projects.
65
 
66
  ## Usage
67
 
68
- The q8 A14B example below is intentionally validation-sized. Do not use this card to claim full-size `1280x720`, 81-frame, 40-step readiness until that exact path has passed video integrity and quality validation.
69
-
70
  ```bash
71
  python -m pip install -U mlx-gen
72
 
@@ -75,7 +67,7 @@ mlxgen download --model AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit
75
  mlxgen generate \
76
  --model AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit \
77
  --task text-to-video \
78
- --prompt "Your video prompt here" \
79
  --width 384 \
80
  --height 224 \
81
  --frames 33 \
@@ -84,12 +76,21 @@ mlxgen generate \
84
  --guidance-2 3 \
85
  --fps 8 \
86
  --seed 4242 \
 
87
  --metadata \
88
  --output video.mp4
89
  ```
90
 
 
 
 
 
 
 
 
 
91
  ## Attribution
92
 
93
- MLX-Gen is based on [mflux](https://github.com/filipstrand/mflux) by Filip Strand and the original mflux contributors. This model card is generated by MLX-Gen so derived checkpoints keep that attribution visible.
94
 
95
  Quantized and contributed by [@lpalbou](https://huggingface.co/lpalbou).
 
9
  - mflux
10
  - apple-silicon
11
  - 8-bit
12
+ - mixed-q8-bf16
13
  - wan
14
  - wan2.2
15
  - video-generation
 
18
  ---
19
  # wan2.2-t2v-a14b-diffusers-8bit
20
 
21
+ This repository contains mixed q8/BF16 MLX-Gen saved weights for
22
+ [`Wan-AI/Wan2.2-T2V-A14B-Diffusers`](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers).
23
+ It is designed for local Apple Silicon inference with
24
+ [`mlx-gen`](https://github.com/lpalbou/mlx-gen).
25
 
26
+ It uses the mflux/MLX saved-weight layout with MLX quantization tensors. It is not a Diffusers or Transformers
27
+ `from_pretrained()` checkpoint.
28
 
29
  ## Source Model
30
 
31
  Original model: [`Wan-AI/Wan2.2-T2V-A14B-Diffusers`](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers).
32
 
 
 
33
  This quantized derivative follows the Apache 2.0 license of the source model.
34
 
35
  ## Quantization
36
 
37
+ This is a mixed q8/BF16 checkpoint:
38
 
39
+ - q8 for quantizable Wan transformer block attention and feed-forward linears.
40
  - BF16 for the Wan VAE.
41
  - BF16 for Wan transformer conditioning/output projection linears, the UMT5 text encoder, scheduler metadata, tokenizer files, norms, convolutions, and other non-quantizable parameters.
42
 
43
+ This mixed policy is used because fully quantizing sensitive Wan A14B paths produced invalid or low-quality video in local validation.
 
 
44
 
45
+ ## Validation
46
 
47
+ Measured on 2026-06-04 with `mlx-gen 0.18.9` on Apple Silicon. The upstream Diffusers source snapshot measured about 118 GiB in the local Hugging Face cache before preparing these packages. The table below reports prepared-package generation from model init through MP4 save and post-save video-health validation.
48
 
49
+ Validation profile: `384x224`, 33 frames, 12 denoising steps, guidance `4`, guidance-2 `3`, 8 fps, seed `4242`, `--low-ram`.
 
 
 
 
 
 
 
 
 
 
50
 
51
+ | Package | Disk | Full-Process Physical Peak | Max RSS | MLX Peak | Total Time | Video Health |
52
+ |---|---:|---:|---:|---:|---:|---|
53
+ | BF16 package | 64.3 GiB | 33.0 GiB | 31.8 GiB | 27.7 GiB | 152.7 s | 33/33 frames, 384x224, 8 fps, temporal delta 1.3 |
54
+ | This mixed q8/BF16 package | 39.7 GiB | 20.7 GiB | 19.5 GiB | 15.5 GiB | 154.8 s | 33/33 frames, 384x224, 8 fps, temporal delta 1.4 |
55
 
56
+ Compared with the BF16 prepared package at the same validation profile, this mixed q8/BF16 package reduces disk usage by about 38% and full-process physical peak memory by about 37%. Total time was about 1% slower in this run.
57
 
58
+ Physical peak is Darwin `ri_phys_footprint` sampled for the full process. The validation is intentionally small and repeatable; it is not a claim that every full-size `1280x720`, 81-frame, 40-step job has the same memory or timing profile.
59
 
60
  ## Usage
61
 
 
 
62
  ```bash
63
  python -m pip install -U mlx-gen
64
 
 
67
  mlxgen generate \
68
  --model AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit \
69
  --task text-to-video \
70
+ --prompt "A cinematic scene of a scientist working on agentic AI through the night, monitors glowing, papers shifting in a slow dolly shot." \
71
  --width 384 \
72
  --height 224 \
73
  --frames 33 \
 
76
  --guidance-2 3 \
77
  --fps 8 \
78
  --seed 4242 \
79
+ --low-ram \
80
  --metadata \
81
  --output video.mp4
82
  ```
83
 
84
+ ## Compatibility
85
+
86
+ Requires `mlx-gen >= 0.18.9`.
87
+
88
+ Generated with `mlx-gen 0.18.9`.
89
+
90
+ Use the `mlxgen` command and Python import path for new MLX-Gen projects.
91
+
92
  ## Attribution
93
 
94
+ MLX-Gen is based on [mflux](https://github.com/filipstrand/mflux) by Filip Strand and the original mflux contributors.
95
 
96
  Quantized and contributed by [@lpalbou](https://huggingface.co/lpalbou).