---
license: apache-2.0
language:
  - en
library_name: diffusers
pipeline_tag: text-to-image
tags:
  - diffusers
  - dit
  - image-generation
  - text-to-image
  - flow-matching
  - mvsplit
inference: true
widget:
  - text: a red panda climbing a bamboo stalk
    output:
      url: demo.png
---

# MVSplit-DiT-1000L

Self-contained Diffusers checkpoint for **MVSplit-DiT** (1000-layer Diffusion Transformer) with a custom `MVSplitDiTPipeline` (`pipeline.py`).

> **Re-distribution notice:** weights are converted from [`StableKirito/mvsplit-dit-1000l`](https://huggingface.co/StableKirito/mvsplit-dit-1000l). Original work: [Mean Mode Screaming: Mean–Variance Split Residuals for 1000-Layer Diffusion Transformers](https://huggingface.co/papers/2605.06169). License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).

## Demo

![MVSplit-DiT-1000L demo](demo.png)

Prompt: *a red panda climbing a bamboo stalk* — 256×256, 35 steps, CFG 2.0.

## Components

- `pipeline.py` — `MVSplitDiTPipeline`
- `model_index.json`
- `transformer/` — `MVSplitDiTTransformer2DModel` (bf16, 1000 layers)
- `scheduler/` — `FlowMatchEulerDiscreteScheduler`
- `text_encoder/` — Qwen3-0.6B (`AutoModel`)
- `tokenizer/` — Qwen3 tokenizer
- `vae/` — FLUX2 VAE (`AutoencoderKLFlux2`)

## Inference

Run the bundled demo script:

```bash
python demo_inference.py
```

This writes `demo.png` with the default prompt and settings below.

```python
from pathlib import Path
import importlib.util
import sys
import torch
from diffusers import AutoencoderKLFlux2
from transformers import AutoModel, AutoTokenizer

model_dir = Path(".").resolve()

transformer_path = model_dir / "transformer" / "transformer_mvsplit_dit.py"
spec = importlib.util.spec_from_file_location("transformer_mvsplit_dit", transformer_path)
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module
spec.loader.exec_module(module)

pipe_spec = importlib.util.spec_from_file_location("mvsplit_pipeline", model_dir / "pipeline.py")
pipe_module = importlib.util.module_from_spec(pipe_spec)
sys.modules[pipe_spec.name] = pipe_module
pipe_spec.loader.exec_module(pipe_module)

transformer = module.MVSplitDiTTransformer2DModel.from_pretrained(
    model_dir / "transformer",
    torch_dtype=torch.bfloat16,
    local_files_only=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_dir / "tokenizer", local_files_only=True)
text_encoder = AutoModel.from_pretrained(
    model_dir / "text_encoder",
    torch_dtype=torch.bfloat16,
    local_files_only=True,
)
vae = AutoencoderKLFlux2.from_pretrained(
    model_dir / "vae",
    torch_dtype=torch.bfloat16,
    local_files_only=True,
)

pipe = pipe_module.MVSplitDiTPipeline(
    transformer=transformer,
    vae=vae,
    text_encoder=text_encoder,
    tokenizer=tokenizer,
    time_shift_alpha=4.0,
)
pipe.enable_sequential_cpu_offload()

generator = torch.Generator(device="cpu").manual_seed(42)
image = pipe(
    prompt="a red panda climbing a bamboo stalk",
    height=256,
    width=256,
    num_inference_steps=35,
    guidance_scale=2.0,
    generator=generator,
).images[0]
image.save("demo.png")
```

### Recommended settings

| Parameter | Default | Notes |
| --- | ---: | --- |
| `height` / `width` | 256 | Square output resolution |
| `num_inference_steps` | 35 | Flow-matching Euler steps |
| `guidance_scale` | 2.0 | Classifier-free guidance |
| `time_shift_alpha` | 4.0 | Time-shift in the flow schedule (must match training) |
| `seed` | 42 | Reproducible sampling |

## Citation

```bibtex
@article{lu2026mms,
  title   = {Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers},
  author  = {Lu, Pengqi},
  journal = {arXiv preprint arXiv:2605.06169},
  year    = {2026},
}
```