Update TI2V-5B memory and validation card

a1c9286 verified 3 days ago

3.67 kB

	---
	license: apache-2.0
	base_model: Wan-AI/Wan2.2-TI2V-5B-Diffusers
	pipeline_tag: text-to-video
	library_name: mlx-gen
	tags:
	- mlx
	- mlx-gen
	- mflux
	- apple-silicon
	- bf16
	- wan
	- wan2.2
	- video-generation
	- text-to-video
	- image-to-video
	---
	# wan2.2-ti2v-5b-diffusers-bf16

	This repository contains BF16 MLX-Gen saved weights for
	[`Wan-AI/Wan2.2-TI2V-5B-Diffusers`](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers).
	It is designed for local Apple Silicon inference with
	[`mlx-gen`](https://github.com/lpalbou/mlx-gen).

	It uses the mflux/MLX saved-weight layout. It is not a Diffusers or Transformers
	`from_pretrained()` checkpoint.

	## Source Model

	Original model: [`Wan-AI/Wan2.2-TI2V-5B-Diffusers`](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers).

	This prepared derivative follows the Apache 2.0 license of the source model.

	## Precision

	The upstream TI2V-5B source snapshot is not uniformly 16-bit on disk: the transformer and VAE
	safetensors are FP32, while the UMT5 text encoder is BF16. MLX-Gen loads Wan transformer/VAE
	weights at BF16 runtime precision, so this prepared BF16 package reduces storage and download size
	but is not a runtime-memory optimization versus source generation.

	Use this package when you want a smaller reusable MLX-Gen folder that preserves source behavior.
	Use the mixed q8/BF16 package when you want a smaller model footprint.

	## Measurements

	Measured on 2026-06-04 with `mlx-gen 0.18.10` on an Apple M5 Max with 128 GiB unified memory.

	Validation profile: `1280x704`, 17 frames, 20 denoising steps, guidance `5`, 24 fps, seed `321`,
	explicit empty negative prompt.

	\| Layout \| Storage \| Logical Model \| Full-Process Physical Peak \| Max RSS \| MLX Peak \| Total Time \| Output \|
	\| --- \| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \| --- \|
	\| Upstream source snapshot \| 31.9 GiB \| 10.6 GiB \| 102.7 GiB \| 13.7 GiB \| 58.5 GiB \| 216.2 s \| [base-source.mp4](validation/ti2v5b-clean/base-source.mp4) \|
	\| This BF16 package \| 21.2 GiB \| 10.6 GiB \| 102.6 GiB \| 14.5 GiB \| 58.5 GiB \| 261.6 s \| [prepared-bf16.mp4](validation/ti2v5b-clean/prepared-bf16.mp4) \|
	\| Mixed q8/BF16 package \| 16.9 GiB \| 6.3 GiB \| 103.7 GiB \| 13.8 GiB \| 54.2 GiB \| 243.4 s \| [mixed-q8-bf16.mp4](validation/ti2v5b-clean/mixed-q8-bf16.mp4) \|

	The source and this BF16 package produced byte-identical decoded MP4 frames. The mixed q8/BF16
	package stayed visually in the same family with mean frame MAE `1.66` versus source/BF16.

	`Storage` is the Hugging Face repository total. `Logical Model` is the loaded Wan transformer plus
	VAE tensor footprint measured from MLX arrays. `Full-Process Physical Peak` is Darwin
	`phys_footprint` sampled from model initialization through MP4 save and health validation.

	Validation assets:

	- [contact-sheet.png](validation/ti2v5b-clean/contact-sheet.png)
	- [metrics.json](validation/ti2v5b-clean/metrics.json)

	## Usage

	```bash
	python -m pip install -U mlx-gen

	mlxgen download --model AbstractFramework/wan2.2-ti2v-5b-diffusers-bf16

	mlxgen generate \
	--model AbstractFramework/wan2.2-ti2v-5b-diffusers-bf16 \
	--prompt "A short cinematic video of a glowing orange glass sphere floating above calm teal water, soft reflections, gentle camera movement" \
	--negative-prompt "" \
	--width 1280 \
	--height 704 \
	--frames 17 \
	--steps 20 \
	--guidance 5 \
	--fps 24 \
	--seed 321 \
	--output video.mp4
	```

	TI2V-5B also supports first-frame image-to-video in MLX-Gen when one input image is supplied.

	## Attribution

	MLX-Gen is based on [mflux](https://github.com/filipstrand/mflux) by Filip Strand and the original
	mflux contributors.

	Prepared and contributed by [@lpalbou](https://huggingface.co/lpalbou).