Instructions to use BiliSakura/MVSplit-DiT-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use BiliSakura/MVSplit-DiT-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("BiliSakura/MVSplit-DiT-diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "a red panda climbing a bamboo stalk" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
| license: apache-2.0 | |
| language: | |
| - en | |
| library_name: diffusers | |
| pipeline_tag: text-to-image | |
| tags: | |
| - diffusers | |
| - dit | |
| - image-generation | |
| - text-to-image | |
| - flow-matching | |
| - mvsplit | |
| inference: true | |
| widget: | |
| - text: a red panda climbing a bamboo stalk | |
| output: | |
| url: demo.png | |
| # MVSplit-DiT-1000L | |
| Self-contained Diffusers checkpoint for **MVSplit-DiT** (1000-layer Diffusion Transformer) with a custom `MVSplitDiTPipeline` (`pipeline.py`). | |
| > **Re-distribution notice:** weights are converted from [`StableKirito/mvsplit-dit-1000l`](https://huggingface.co/StableKirito/mvsplit-dit-1000l). Original work: [Mean Mode Screaming: Mean–Variance Split Residuals for 1000-Layer Diffusion Transformers](https://huggingface.co/papers/2605.06169). License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). | |
| ## Demo | |
|  | |
| Prompt: *a red panda climbing a bamboo stalk* — 256×256, 35 steps, CFG 2.0. | |
| ## Components | |
| - `pipeline.py` — `MVSplitDiTPipeline` | |
| - `model_index.json` | |
| - `transformer/` — `MVSplitDiTTransformer2DModel` (bf16, 1000 layers) | |
| - `scheduler/` — `FlowMatchEulerDiscreteScheduler` | |
| - `text_encoder/` — Qwen3-0.6B (`AutoModel`) | |
| - `tokenizer/` — Qwen3 tokenizer | |
| - `vae/` — FLUX2 VAE (`AutoencoderKLFlux2`) | |
| ## Inference | |
| Run the bundled demo script: | |
| ```bash | |
| python demo_inference.py | |
| ``` | |
| This writes `demo.png` with the default prompt and settings below. | |
| ```python | |
| from pathlib import Path | |
| import importlib.util | |
| import sys | |
| import torch | |
| from diffusers import AutoencoderKLFlux2 | |
| from transformers import AutoModel, AutoTokenizer | |
| model_dir = Path(".").resolve() | |
| transformer_path = model_dir / "transformer" / "transformer_mvsplit_dit.py" | |
| spec = importlib.util.spec_from_file_location("transformer_mvsplit_dit", transformer_path) | |
| module = importlib.util.module_from_spec(spec) | |
| sys.modules[spec.name] = module | |
| spec.loader.exec_module(module) | |
| pipe_spec = importlib.util.spec_from_file_location("mvsplit_pipeline", model_dir / "pipeline.py") | |
| pipe_module = importlib.util.module_from_spec(pipe_spec) | |
| sys.modules[pipe_spec.name] = pipe_module | |
| pipe_spec.loader.exec_module(pipe_module) | |
| transformer = module.MVSplitDiTTransformer2DModel.from_pretrained( | |
| model_dir / "transformer", | |
| torch_dtype=torch.bfloat16, | |
| local_files_only=True, | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained(model_dir / "tokenizer", local_files_only=True) | |
| text_encoder = AutoModel.from_pretrained( | |
| model_dir / "text_encoder", | |
| torch_dtype=torch.bfloat16, | |
| local_files_only=True, | |
| ) | |
| vae = AutoencoderKLFlux2.from_pretrained( | |
| model_dir / "vae", | |
| torch_dtype=torch.bfloat16, | |
| local_files_only=True, | |
| ) | |
| pipe = pipe_module.MVSplitDiTPipeline( | |
| transformer=transformer, | |
| vae=vae, | |
| text_encoder=text_encoder, | |
| tokenizer=tokenizer, | |
| time_shift_alpha=4.0, | |
| ) | |
| pipe.enable_sequential_cpu_offload() | |
| generator = torch.Generator(device="cpu").manual_seed(42) | |
| image = pipe( | |
| prompt="a red panda climbing a bamboo stalk", | |
| height=256, | |
| width=256, | |
| num_inference_steps=35, | |
| guidance_scale=2.0, | |
| generator=generator, | |
| ).images[0] | |
| image.save("demo.png") | |
| ``` | |
| ### Recommended settings | |
| | Parameter | Default | Notes | | |
| | --- | ---: | --- | | |
| | `height` / `width` | 256 | Square output resolution | | |
| | `num_inference_steps` | 35 | Flow-matching Euler steps | | |
| | `guidance_scale` | 2.0 | Classifier-free guidance | | |
| | `time_shift_alpha` | 4.0 | Time-shift in the flow schedule (must match training) | | |
| | `seed` | 42 | Reproducible sampling | | |
| ## Citation | |
| ```bibtex | |
| @article{lu2026mms, | |
| title = {Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers}, | |
| author = {Lu, Pengqi}, | |
| journal = {arXiv preprint arXiv:2605.06169}, | |
| year = {2026}, | |
| } | |
| ``` | |