Instructions to use mlx-community/Bernini-R-bf16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/Bernini-R-bf16 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Bernini-R-bf16 mlx-community/Bernini-R-bf16
- Wan2.2
How to use mlx-community/Bernini-R-bf16 with Wan2.2:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Bernini-R (MLX)
Apple MLX port of ByteDance/Bernini-R — the open-sourced Renderer of ByteDance's Bernini: a Wan2.2-T2V-A14B-derived video generator/editor with Segment-Aware 3D RoPE for multi-reference / editing tasks.
Runs on Apple Silicon via MLX + the mlx-video Wan2.2 backbone.
⚠️ Scope: renderer only
Only the Renderer ("-R") is open-sourced upstream. The MLLM semantic planner (the paper's headline "latent semantic planning", a Qwen2.5-VL-7B model) is not released. This port therefore runs with UMT5 text conditioning only — the planner-feature channel is absent (and carries no weights in the released checkpoint). You get the renderer's editing / reference-to-video / subject-consistency behavior, not the full planner-guided system.
Tasks
| Task | Description |
|---|---|
t2v / t2i |
text-to-video / image |
r2v |
reference-to-video — generate a subject from up to K reference images (chained APG) |
v2v |
prompt-based video editing (source video injected as conditioning) |
rv2v |
reference + video editing |
Variants
| Repo | Precision | Size / expert |
|---|---|---|
…-bf16 |
bfloat16 | 28.6 GB |
…-int4 |
4-bit (group 64) | 8.4 GB |
Two experts (high/low-noise) + 16-ch Wan2.2 VAE (0.5 GB) + UMT5 (11 GB).
Usage
from bernini_r_mlx import pipeline_mlx as P
# text-to-video
P.t2v("path/to/ckpt", "a red fox in a snowy forest", num_frames=49, output_path="out.mp4")
# reference-to-video (subject consistency)
P.r2v("path/to/ckpt", "the fox running across a field",
reference_images=["fox.png"], output_path="r2v.mp4")
# video editing
P.v2v("path/to/ckpt", "... autumn forest ...", source_video="in.mp4", output_path="v2v.mp4")
Provenance & validation
- Architecture: stock Wan2.2-T2V-A14B (verified — diffusers
WanTransformer3DModelkeys, no extra tensors); Bernini knobs (switch_dit_boundary 0.875,shift 3.0,use_src_id_rotary_emb) live in the wrapper config. SA-3D RoPE adds no parameters. - Converted fp32 → bf16 from
ByteDance/Bernini-R-Diffusers; VAE/UMT5 fromWan-AI/Wan2.2-T2V-A14B. - Validated: SA-3D RoPE parity ~1e-7; VAE roundtrip MAD 2.1/255; multi-segment forward bit-exact vs t2v; int4 per-pass cosine 0.9992 vs bf16; e2e t2v / r2v / v2v coherent.
License & attribution
Apache-2.0. Derived from ByteDance Bernini-R, Wan2.2 (Wan-AI), and mlx-video. See NOTICE.
- Downloads last month
- 9
Quantized
Model tree for mlx-community/Bernini-R-bf16
Base model
ByteDance/Bernini-R-Diffusers
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js