File size: 3,673 Bytes
14622af
 
 
 
 
 
 
 
 
 
a1c9286
14622af
 
 
 
 
 
 
 
a1c9286
 
 
 
14622af
a1c9286
 
14622af
 
 
 
 
 
 
a1c9286
 
 
 
 
 
 
 
 
14622af
a1c9286
14622af
a1c9286
14622af
a1c9286
 
14622af
a1c9286
 
 
 
 
14622af
a1c9286
 
14622af
a1c9286
 
 
 
 
 
 
 
14622af
 
 
 
 
 
 
 
 
 
a1c9286
 
14622af
 
a1c9286
 
14622af
 
a1c9286
14622af
 
 
a1c9286
 
14622af
 
a1c9286
 
14622af
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
license: apache-2.0
base_model: Wan-AI/Wan2.2-TI2V-5B-Diffusers
pipeline_tag: text-to-video
library_name: mlx-gen
tags:
- mlx
- mlx-gen
- mflux
- apple-silicon
- bf16
- wan
- wan2.2
- video-generation
- text-to-video
- image-to-video
---
# wan2.2-ti2v-5b-diffusers-bf16

This repository contains BF16 MLX-Gen saved weights for
[`Wan-AI/Wan2.2-TI2V-5B-Diffusers`](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers).
It is designed for local Apple Silicon inference with
[`mlx-gen`](https://github.com/lpalbou/mlx-gen).

It uses the mflux/MLX saved-weight layout. It is not a Diffusers or Transformers
`from_pretrained()` checkpoint.

## Source Model

Original model: [`Wan-AI/Wan2.2-TI2V-5B-Diffusers`](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers).

This prepared derivative follows the Apache 2.0 license of the source model.

## Precision

The upstream TI2V-5B source snapshot is not uniformly 16-bit on disk: the transformer and VAE
safetensors are FP32, while the UMT5 text encoder is BF16. MLX-Gen loads Wan transformer/VAE
weights at BF16 runtime precision, so this prepared BF16 package reduces storage and download size
but is not a runtime-memory optimization versus source generation.

Use this package when you want a smaller reusable MLX-Gen folder that preserves source behavior.
Use the mixed q8/BF16 package when you want a smaller model footprint.

## Measurements

Measured on 2026-06-04 with `mlx-gen 0.18.10` on an Apple M5 Max with 128 GiB unified memory.

Validation profile: `1280x704`, 17 frames, 20 denoising steps, guidance `5`, 24 fps, seed `321`,
explicit empty negative prompt.

| Layout | Storage | Logical Model | Full-Process Physical Peak | Max RSS | MLX Peak | Total Time | Output |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | --- |
| Upstream source snapshot | 31.9 GiB | 10.6 GiB | 102.7 GiB | 13.7 GiB | 58.5 GiB | 216.2 s | [base-source.mp4](validation/ti2v5b-clean/base-source.mp4) |
| This BF16 package | 21.2 GiB | 10.6 GiB | 102.6 GiB | 14.5 GiB | 58.5 GiB | 261.6 s | [prepared-bf16.mp4](validation/ti2v5b-clean/prepared-bf16.mp4) |
| Mixed q8/BF16 package | 16.9 GiB | 6.3 GiB | 103.7 GiB | 13.8 GiB | 54.2 GiB | 243.4 s | [mixed-q8-bf16.mp4](validation/ti2v5b-clean/mixed-q8-bf16.mp4) |

The source and this BF16 package produced byte-identical decoded MP4 frames. The mixed q8/BF16
package stayed visually in the same family with mean frame MAE `1.66` versus source/BF16.

`Storage` is the Hugging Face repository total. `Logical Model` is the loaded Wan transformer plus
VAE tensor footprint measured from MLX arrays. `Full-Process Physical Peak` is Darwin
`phys_footprint` sampled from model initialization through MP4 save and health validation.

Validation assets:

- [contact-sheet.png](validation/ti2v5b-clean/contact-sheet.png)
- [metrics.json](validation/ti2v5b-clean/metrics.json)

## Usage

```bash
python -m pip install -U mlx-gen

mlxgen download --model AbstractFramework/wan2.2-ti2v-5b-diffusers-bf16

mlxgen generate \
  --model AbstractFramework/wan2.2-ti2v-5b-diffusers-bf16 \
  --prompt "A short cinematic video of a glowing orange glass sphere floating above calm teal water, soft reflections, gentle camera movement" \
  --negative-prompt "" \
  --width 1280 \
  --height 704 \
  --frames 17 \
  --steps 20 \
  --guidance 5 \
  --fps 24 \
  --seed 321 \
  --output video.mp4
```

TI2V-5B also supports first-frame image-to-video in MLX-Gen when one input image is supplied.

## Attribution

MLX-Gen is based on [mflux](https://github.com/filipstrand/mflux) by Filip Strand and the original
mflux contributors.

Prepared and contributed by [@lpalbou](https://huggingface.co/lpalbou).