BiliSakura's picture
Upload folder using huggingface_hub
bb3feea verified
---
license: apache-2.0
language:
- en
library_name: diffusers
pipeline_tag: text-to-image
tags:
- diffusers
- dit
- image-generation
- text-to-image
- flow-matching
- mvsplit
inference: true
widget:
- text: a red panda climbing a bamboo stalk
output:
url: demo.png
---
# MVSplit-DiT-1000L
Self-contained Diffusers checkpoint for **MVSplit-DiT** (1000-layer Diffusion Transformer) with a custom `MVSplitDiTPipeline` (`pipeline.py`).
> **Re-distribution notice:** weights are converted from [`StableKirito/mvsplit-dit-1000l`](https://huggingface.co/StableKirito/mvsplit-dit-1000l). Original work: [Mean Mode Screaming: Mean–Variance Split Residuals for 1000-Layer Diffusion Transformers](https://huggingface.co/papers/2605.06169). License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
## Demo
![MVSplit-DiT-1000L demo](demo.png)
Prompt: *a red panda climbing a bamboo stalk* — 256×256, 35 steps, CFG 2.0.
## Components
- `pipeline.py``MVSplitDiTPipeline`
- `model_index.json`
- `transformer/``MVSplitDiTTransformer2DModel` (bf16, 1000 layers)
- `scheduler/``FlowMatchEulerDiscreteScheduler`
- `text_encoder/` — Qwen3-0.6B (`AutoModel`)
- `tokenizer/` — Qwen3 tokenizer
- `vae/` — FLUX2 VAE (`AutoencoderKLFlux2`)
## Inference
Run the bundled demo script:
```bash
python demo_inference.py
```
This writes `demo.png` with the default prompt and settings below.
```python
from pathlib import Path
import importlib.util
import sys
import torch
from diffusers import AutoencoderKLFlux2
from transformers import AutoModel, AutoTokenizer
model_dir = Path(".").resolve()
transformer_path = model_dir / "transformer" / "transformer_mvsplit_dit.py"
spec = importlib.util.spec_from_file_location("transformer_mvsplit_dit", transformer_path)
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module
spec.loader.exec_module(module)
pipe_spec = importlib.util.spec_from_file_location("mvsplit_pipeline", model_dir / "pipeline.py")
pipe_module = importlib.util.module_from_spec(pipe_spec)
sys.modules[pipe_spec.name] = pipe_module
pipe_spec.loader.exec_module(pipe_module)
transformer = module.MVSplitDiTTransformer2DModel.from_pretrained(
model_dir / "transformer",
torch_dtype=torch.bfloat16,
local_files_only=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_dir / "tokenizer", local_files_only=True)
text_encoder = AutoModel.from_pretrained(
model_dir / "text_encoder",
torch_dtype=torch.bfloat16,
local_files_only=True,
)
vae = AutoencoderKLFlux2.from_pretrained(
model_dir / "vae",
torch_dtype=torch.bfloat16,
local_files_only=True,
)
pipe = pipe_module.MVSplitDiTPipeline(
transformer=transformer,
vae=vae,
text_encoder=text_encoder,
tokenizer=tokenizer,
time_shift_alpha=4.0,
)
pipe.enable_sequential_cpu_offload()
generator = torch.Generator(device="cpu").manual_seed(42)
image = pipe(
prompt="a red panda climbing a bamboo stalk",
height=256,
width=256,
num_inference_steps=35,
guidance_scale=2.0,
generator=generator,
).images[0]
image.save("demo.png")
```
### Recommended settings
| Parameter | Default | Notes |
| --- | ---: | --- |
| `height` / `width` | 256 | Square output resolution |
| `num_inference_steps` | 35 | Flow-matching Euler steps |
| `guidance_scale` | 2.0 | Classifier-free guidance |
| `time_shift_alpha` | 4.0 | Time-shift in the flow schedule (must match training) |
| `seed` | 42 | Reproducible sampling |
## Citation
```bibtex
@article{lu2026mms,
title = {Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers},
author = {Lu, Pengqi},
journal = {arXiv preprint arXiv:2605.06169},
year = {2026},
}
```