Instructions to use BiliSakura/MVSplit-DiT-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use BiliSakura/MVSplit-DiT-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("BiliSakura/MVSplit-DiT-diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "a red panda climbing a bamboo stalk" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
File size: 3,767 Bytes
4e67e00 bb3feea 4e67e00 bb3feea 4e67e00 bb3feea 4e67e00 bb3feea 4e67e00 bb3feea 4e67e00 bb3feea 4e67e00 bb3feea 4e67e00 bb3feea 4e67e00 bb3feea 4e67e00 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | ---
license: apache-2.0
language:
- en
library_name: diffusers
pipeline_tag: text-to-image
tags:
- diffusers
- dit
- image-generation
- text-to-image
- flow-matching
- mvsplit
inference: true
widget:
- text: a red panda climbing a bamboo stalk
output:
url: demo.png
---
# MVSplit-DiT-1000L
Self-contained Diffusers checkpoint for **MVSplit-DiT** (1000-layer Diffusion Transformer) with a custom `MVSplitDiTPipeline` (`pipeline.py`).
> **Re-distribution notice:** weights are converted from [`StableKirito/mvsplit-dit-1000l`](https://huggingface.co/StableKirito/mvsplit-dit-1000l). Original work: [Mean Mode Screaming: Mean–Variance Split Residuals for 1000-Layer Diffusion Transformers](https://huggingface.co/papers/2605.06169). License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
## Demo

Prompt: *a red panda climbing a bamboo stalk* — 256×256, 35 steps, CFG 2.0.
## Components
- `pipeline.py` — `MVSplitDiTPipeline`
- `model_index.json`
- `transformer/` — `MVSplitDiTTransformer2DModel` (bf16, 1000 layers)
- `scheduler/` — `FlowMatchEulerDiscreteScheduler`
- `text_encoder/` — Qwen3-0.6B (`AutoModel`)
- `tokenizer/` — Qwen3 tokenizer
- `vae/` — FLUX2 VAE (`AutoencoderKLFlux2`)
## Inference
Run the bundled demo script:
```bash
python demo_inference.py
```
This writes `demo.png` with the default prompt and settings below.
```python
from pathlib import Path
import importlib.util
import sys
import torch
from diffusers import AutoencoderKLFlux2
from transformers import AutoModel, AutoTokenizer
model_dir = Path(".").resolve()
transformer_path = model_dir / "transformer" / "transformer_mvsplit_dit.py"
spec = importlib.util.spec_from_file_location("transformer_mvsplit_dit", transformer_path)
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module
spec.loader.exec_module(module)
pipe_spec = importlib.util.spec_from_file_location("mvsplit_pipeline", model_dir / "pipeline.py")
pipe_module = importlib.util.module_from_spec(pipe_spec)
sys.modules[pipe_spec.name] = pipe_module
pipe_spec.loader.exec_module(pipe_module)
transformer = module.MVSplitDiTTransformer2DModel.from_pretrained(
model_dir / "transformer",
torch_dtype=torch.bfloat16,
local_files_only=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_dir / "tokenizer", local_files_only=True)
text_encoder = AutoModel.from_pretrained(
model_dir / "text_encoder",
torch_dtype=torch.bfloat16,
local_files_only=True,
)
vae = AutoencoderKLFlux2.from_pretrained(
model_dir / "vae",
torch_dtype=torch.bfloat16,
local_files_only=True,
)
pipe = pipe_module.MVSplitDiTPipeline(
transformer=transformer,
vae=vae,
text_encoder=text_encoder,
tokenizer=tokenizer,
time_shift_alpha=4.0,
)
pipe.enable_sequential_cpu_offload()
generator = torch.Generator(device="cpu").manual_seed(42)
image = pipe(
prompt="a red panda climbing a bamboo stalk",
height=256,
width=256,
num_inference_steps=35,
guidance_scale=2.0,
generator=generator,
).images[0]
image.save("demo.png")
```
### Recommended settings
| Parameter | Default | Notes |
| --- | ---: | --- |
| `height` / `width` | 256 | Square output resolution |
| `num_inference_steps` | 35 | Flow-matching Euler steps |
| `guidance_scale` | 2.0 | Classifier-free guidance |
| `time_shift_alpha` | 4.0 | Time-shift in the flow schedule (must match training) |
| `seed` | 42 | Reproducible sampling |
## Citation
```bibtex
@article{lu2026mms,
title = {Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers},
author = {Lu, Pengqi},
journal = {arXiv preprint arXiv:2605.06169},
year = {2026},
}
```
|