MVSplit-DiT-1000L

Self-contained Diffusers checkpoint for MVSplit-DiT (1000-layer Diffusion Transformer) with a custom MVSplitDiTPipeline (pipeline.py).

Re-distribution notice: weights are converted from StableKirito/mvsplit-dit-1000l. Original work: Mean Mode Screaming: Mean–Variance Split Residuals for 1000-Layer Diffusion Transformers. License: Apache 2.0.

Demo

MVSplit-DiT-1000L demo

Prompt: a red panda climbing a bamboo stalk — 256×256, 35 steps, CFG 2.0.

Components

  • pipeline.pyMVSplitDiTPipeline
  • model_index.json
  • transformer/MVSplitDiTTransformer2DModel (bf16, 1000 layers)
  • scheduler/FlowMatchEulerDiscreteScheduler
  • text_encoder/ — Qwen3-0.6B (AutoModel)
  • tokenizer/ — Qwen3 tokenizer
  • vae/ — FLUX2 VAE (AutoencoderKLFlux2)

Inference

Run the bundled demo script:

python demo_inference.py

This writes demo.png with the default prompt and settings below.

from pathlib import Path
import importlib.util
import sys
import torch
from diffusers import AutoencoderKLFlux2
from transformers import AutoModel, AutoTokenizer

model_dir = Path(".").resolve()

transformer_path = model_dir / "transformer" / "transformer_mvsplit_dit.py"
spec = importlib.util.spec_from_file_location("transformer_mvsplit_dit", transformer_path)
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module
spec.loader.exec_module(module)

pipe_spec = importlib.util.spec_from_file_location("mvsplit_pipeline", model_dir / "pipeline.py")
pipe_module = importlib.util.module_from_spec(pipe_spec)
sys.modules[pipe_spec.name] = pipe_module
pipe_spec.loader.exec_module(pipe_module)

transformer = module.MVSplitDiTTransformer2DModel.from_pretrained(
    model_dir / "transformer",
    torch_dtype=torch.bfloat16,
    local_files_only=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_dir / "tokenizer", local_files_only=True)
text_encoder = AutoModel.from_pretrained(
    model_dir / "text_encoder",
    torch_dtype=torch.bfloat16,
    local_files_only=True,
)
vae = AutoencoderKLFlux2.from_pretrained(
    model_dir / "vae",
    torch_dtype=torch.bfloat16,
    local_files_only=True,
)

pipe = pipe_module.MVSplitDiTPipeline(
    transformer=transformer,
    vae=vae,
    text_encoder=text_encoder,
    tokenizer=tokenizer,
    time_shift_alpha=4.0,
)
pipe.enable_sequential_cpu_offload()

generator = torch.Generator(device="cpu").manual_seed(42)
image = pipe(
    prompt="a red panda climbing a bamboo stalk",
    height=256,
    width=256,
    num_inference_steps=35,
    guidance_scale=2.0,
    generator=generator,
).images[0]
image.save("demo.png")

Recommended settings

Parameter Default Notes
height / width 256 Square output resolution
num_inference_steps 35 Flow-matching Euler steps
guidance_scale 2.0 Classifier-free guidance
time_shift_alpha 4.0 Time-shift in the flow schedule (must match training)
seed 42 Reproducible sampling

Citation

@article{lu2026mms,
  title   = {Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers},
  author  = {Lu, Pengqi},
  journal = {arXiv preprint arXiv:2605.06169},
  year    = {2026},
}
Downloads last month
-
Inference Examples
Examples
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for BiliSakura/MVSplit-DiT-diffusers