Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers
Paper • 2605.06169 • Published • 215
How to use BiliSakura/MVSplit-DiT-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("BiliSakura/MVSplit-DiT-diffusers", dtype=torch.bfloat16, device_map="cuda")
prompt = "a red panda climbing a bamboo stalk"
image = pipe(prompt).images[0]Self-contained Diffusers checkpoint for MVSplit-DiT (1000-layer Diffusion Transformer) with a custom MVSplitDiTPipeline (pipeline.py).
Re-distribution notice: weights are converted from
StableKirito/mvsplit-dit-1000l. Original work: Mean Mode Screaming: Mean–Variance Split Residuals for 1000-Layer Diffusion Transformers. License: Apache 2.0.
Prompt: a red panda climbing a bamboo stalk — 256×256, 35 steps, CFG 2.0.
pipeline.py — MVSplitDiTPipelinemodel_index.jsontransformer/ — MVSplitDiTTransformer2DModel (bf16, 1000 layers)scheduler/ — FlowMatchEulerDiscreteSchedulertext_encoder/ — Qwen3-0.6B (AutoModel)tokenizer/ — Qwen3 tokenizervae/ — FLUX2 VAE (AutoencoderKLFlux2)Run the bundled demo script:
python demo_inference.py
This writes demo.png with the default prompt and settings below.
from pathlib import Path
import importlib.util
import sys
import torch
from diffusers import AutoencoderKLFlux2
from transformers import AutoModel, AutoTokenizer
model_dir = Path(".").resolve()
transformer_path = model_dir / "transformer" / "transformer_mvsplit_dit.py"
spec = importlib.util.spec_from_file_location("transformer_mvsplit_dit", transformer_path)
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module
spec.loader.exec_module(module)
pipe_spec = importlib.util.spec_from_file_location("mvsplit_pipeline", model_dir / "pipeline.py")
pipe_module = importlib.util.module_from_spec(pipe_spec)
sys.modules[pipe_spec.name] = pipe_module
pipe_spec.loader.exec_module(pipe_module)
transformer = module.MVSplitDiTTransformer2DModel.from_pretrained(
model_dir / "transformer",
torch_dtype=torch.bfloat16,
local_files_only=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_dir / "tokenizer", local_files_only=True)
text_encoder = AutoModel.from_pretrained(
model_dir / "text_encoder",
torch_dtype=torch.bfloat16,
local_files_only=True,
)
vae = AutoencoderKLFlux2.from_pretrained(
model_dir / "vae",
torch_dtype=torch.bfloat16,
local_files_only=True,
)
pipe = pipe_module.MVSplitDiTPipeline(
transformer=transformer,
vae=vae,
text_encoder=text_encoder,
tokenizer=tokenizer,
time_shift_alpha=4.0,
)
pipe.enable_sequential_cpu_offload()
generator = torch.Generator(device="cpu").manual_seed(42)
image = pipe(
prompt="a red panda climbing a bamboo stalk",
height=256,
width=256,
num_inference_steps=35,
guidance_scale=2.0,
generator=generator,
).images[0]
image.save("demo.png")
| Parameter | Default | Notes |
|---|---|---|
height / width |
256 | Square output resolution |
num_inference_steps |
35 | Flow-matching Euler steps |
guidance_scale |
2.0 | Classifier-free guidance |
time_shift_alpha |
4.0 | Time-shift in the flow schedule (must match training) |
seed |
42 | Reproducible sampling |
@article{lu2026mms,
title = {Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers},
author = {Lu, Pengqi},
journal = {arXiv preprint arXiv:2605.06169},
year = {2026},
}