--- license: apache-2.0 language: - en library_name: diffusers pipeline_tag: text-to-image tags: - diffusers - dit - image-generation - text-to-image - flow-matching - mvsplit inference: true widget: - text: a red panda climbing a bamboo stalk output: url: demo.png --- # MVSplit-DiT-1000L Self-contained Diffusers checkpoint for **MVSplit-DiT** (1000-layer Diffusion Transformer) with a custom `MVSplitDiTPipeline` (`pipeline.py`). > **Re-distribution notice:** weights are converted from [`StableKirito/mvsplit-dit-1000l`](https://huggingface.co/StableKirito/mvsplit-dit-1000l). Original work: [Mean Mode Screaming: Mean–Variance Split Residuals for 1000-Layer Diffusion Transformers](https://huggingface.co/papers/2605.06169). License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). ## Demo ![MVSplit-DiT-1000L demo](demo.png) Prompt: *a red panda climbing a bamboo stalk* — 256×256, 35 steps, CFG 2.0. ## Components - `pipeline.py` — `MVSplitDiTPipeline` - `model_index.json` - `transformer/` — `MVSplitDiTTransformer2DModel` (bf16, 1000 layers) - `scheduler/` — `FlowMatchEulerDiscreteScheduler` - `text_encoder/` — Qwen3-0.6B (`AutoModel`) - `tokenizer/` — Qwen3 tokenizer - `vae/` — FLUX2 VAE (`AutoencoderKLFlux2`) ## Inference Run the bundled demo script: ```bash python demo_inference.py ``` This writes `demo.png` with the default prompt and settings below. ```python from pathlib import Path import importlib.util import sys import torch from diffusers import AutoencoderKLFlux2 from transformers import AutoModel, AutoTokenizer model_dir = Path(".").resolve() transformer_path = model_dir / "transformer" / "transformer_mvsplit_dit.py" spec = importlib.util.spec_from_file_location("transformer_mvsplit_dit", transformer_path) module = importlib.util.module_from_spec(spec) sys.modules[spec.name] = module spec.loader.exec_module(module) pipe_spec = importlib.util.spec_from_file_location("mvsplit_pipeline", model_dir / "pipeline.py") pipe_module = importlib.util.module_from_spec(pipe_spec) sys.modules[pipe_spec.name] = pipe_module pipe_spec.loader.exec_module(pipe_module) transformer = module.MVSplitDiTTransformer2DModel.from_pretrained( model_dir / "transformer", torch_dtype=torch.bfloat16, local_files_only=True, ) tokenizer = AutoTokenizer.from_pretrained(model_dir / "tokenizer", local_files_only=True) text_encoder = AutoModel.from_pretrained( model_dir / "text_encoder", torch_dtype=torch.bfloat16, local_files_only=True, ) vae = AutoencoderKLFlux2.from_pretrained( model_dir / "vae", torch_dtype=torch.bfloat16, local_files_only=True, ) pipe = pipe_module.MVSplitDiTPipeline( transformer=transformer, vae=vae, text_encoder=text_encoder, tokenizer=tokenizer, time_shift_alpha=4.0, ) pipe.enable_sequential_cpu_offload() generator = torch.Generator(device="cpu").manual_seed(42) image = pipe( prompt="a red panda climbing a bamboo stalk", height=256, width=256, num_inference_steps=35, guidance_scale=2.0, generator=generator, ).images[0] image.save("demo.png") ``` ### Recommended settings | Parameter | Default | Notes | | --- | ---: | --- | | `height` / `width` | 256 | Square output resolution | | `num_inference_steps` | 35 | Flow-matching Euler steps | | `guidance_scale` | 2.0 | Classifier-free guidance | | `time_shift_alpha` | 4.0 | Time-shift in the flow schedule (must match training) | | `seed` | 42 | Reproducible sampling | ## Citation ```bibtex @article{lu2026mms, title = {Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers}, author = {Lu, Pengqi}, journal = {arXiv preprint arXiv:2605.06169}, year = {2026}, } ```