Instructions to use mlx-community/Bernini-R-int4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/Bernini-R-int4 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Bernini-R-int4 mlx-community/Bernini-R-int4
- Wan2.2
How to use mlx-community/Bernini-R-int4 with Wan2.2:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
| license: apache-2.0 | |
| library_name: mlx | |
| pipeline_tag: text-to-video | |
| tags: | |
| - mlx | |
| - text-to-video | |
| - video-editing | |
| - video-to-video | |
| - reference-to-video | |
| - wan2.2 | |
| - bernini | |
| base_model: ByteDance/Bernini-R-Diffusers | |
| # Bernini-R (MLX) | |
| Apple MLX port of **[ByteDance/Bernini-R](https://huggingface.co/ByteDance/Bernini-R-Diffusers)** — | |
| the open-sourced *Renderer* of ByteDance's Bernini: a Wan2.2-T2V-A14B-derived video | |
| generator/editor with **Segment-Aware 3D RoPE** for multi-reference / editing tasks. | |
| Runs on Apple Silicon via [MLX](https://github.com/ml-explore/mlx) + the | |
| [mlx-video](https://github.com/Blaizzy/mlx-video) Wan2.2 backbone. | |
| ## ⚠️ Scope: renderer only | |
| Only the Renderer ("-R") is open-sourced upstream. The MLLM semantic **planner** (the | |
| paper's headline "latent semantic planning", a Qwen2.5-VL-7B model) is **not released**. | |
| This port therefore runs with **UMT5 text conditioning only** — the planner-feature | |
| channel is absent (and carries no weights in the released checkpoint). You get the | |
| renderer's editing / reference-to-video / subject-consistency behavior, not the full | |
| planner-guided system. | |
| ## Tasks | |
| | Task | Description | | |
| |---|---| | |
| | `t2v` / `t2i` | text-to-video / image | | |
| | `r2v` | reference-to-video — generate a subject from up to K reference images (chained APG) | | |
| | `v2v` | prompt-based video editing (source video injected as conditioning) | | |
| | `rv2v` | reference + video editing | | |
| ## Variants | |
| | Repo | Precision | Size / expert | | |
| |---|---|---| | |
| | `…-bf16` | bfloat16 | 28.6 GB | | |
| | `…-int4` | 4-bit (group 64) | 8.4 GB | | |
| Two experts (high/low-noise) + 16-ch Wan2.2 VAE (0.5 GB) + UMT5 (11 GB). | |
| ## Usage | |
| ```python | |
| from bernini_r_mlx import pipeline_mlx as P | |
| # text-to-video | |
| P.t2v("path/to/ckpt", "a red fox in a snowy forest", num_frames=49, output_path="out.mp4") | |
| # reference-to-video (subject consistency) | |
| P.r2v("path/to/ckpt", "the fox running across a field", | |
| reference_images=["fox.png"], output_path="r2v.mp4") | |
| # video editing | |
| P.v2v("path/to/ckpt", "... autumn forest ...", source_video="in.mp4", output_path="v2v.mp4") | |
| ``` | |
| ## Provenance & validation | |
| - Architecture: **stock Wan2.2-T2V-A14B** (verified — diffusers `WanTransformer3DModel` keys, | |
| no extra tensors); Bernini knobs (`switch_dit_boundary 0.875`, `shift 3.0`, | |
| `use_src_id_rotary_emb`) live in the wrapper config. SA-3D RoPE adds **no parameters**. | |
| - Converted fp32 → bf16 from `ByteDance/Bernini-R-Diffusers`; VAE/UMT5 from `Wan-AI/Wan2.2-T2V-A14B`. | |
| - Validated: SA-3D RoPE parity ~1e-7; VAE roundtrip MAD 2.1/255; multi-segment forward | |
| bit-exact vs t2v; int4 per-pass cosine 0.9992 vs bf16; e2e t2v / r2v / v2v coherent. | |
| ## License & attribution | |
| Apache-2.0. Derived from ByteDance Bernini-R, Wan2.2 (Wan-AI), and mlx-video. See `NOTICE`. | |