File size: 3,767 Bytes
4e67e00
 
 
 
 
 
 
 
 
 
 
 
 
bb3feea
4e67e00
 
 
bb3feea
4e67e00
 
bb3feea
4e67e00
bb3feea
4e67e00
bb3feea
4e67e00
 
 
bb3feea
4e67e00
 
 
bb3feea
 
 
 
 
 
 
 
 
 
4e67e00
 
bb3feea
 
4e67e00
 
 
 
bb3feea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4e67e00
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
license: apache-2.0
language:
  - en
library_name: diffusers
pipeline_tag: text-to-image
tags:
  - diffusers
  - dit
  - image-generation
  - text-to-image
  - flow-matching
  - mvsplit
inference: true
widget:
  - text: a red panda climbing a bamboo stalk
    output:
      url: demo.png
---

# MVSplit-DiT-1000L

Self-contained Diffusers checkpoint for **MVSplit-DiT** (1000-layer Diffusion Transformer) with a custom `MVSplitDiTPipeline` (`pipeline.py`).

> **Re-distribution notice:** weights are converted from [`StableKirito/mvsplit-dit-1000l`](https://huggingface.co/StableKirito/mvsplit-dit-1000l). Original work: [Mean Mode Screaming: Mean–Variance Split Residuals for 1000-Layer Diffusion Transformers](https://huggingface.co/papers/2605.06169). License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).

## Demo

![MVSplit-DiT-1000L demo](demo.png)

Prompt: *a red panda climbing a bamboo stalk* — 256×256, 35 steps, CFG 2.0.

## Components

- `pipeline.py``MVSplitDiTPipeline`
- `model_index.json`
- `transformer/``MVSplitDiTTransformer2DModel` (bf16, 1000 layers)
- `scheduler/``FlowMatchEulerDiscreteScheduler`
- `text_encoder/` — Qwen3-0.6B (`AutoModel`)
- `tokenizer/` — Qwen3 tokenizer
- `vae/` — FLUX2 VAE (`AutoencoderKLFlux2`)

## Inference

Run the bundled demo script:

```bash
python demo_inference.py
```

This writes `demo.png` with the default prompt and settings below.

```python
from pathlib import Path
import importlib.util
import sys
import torch
from diffusers import AutoencoderKLFlux2
from transformers import AutoModel, AutoTokenizer

model_dir = Path(".").resolve()

transformer_path = model_dir / "transformer" / "transformer_mvsplit_dit.py"
spec = importlib.util.spec_from_file_location("transformer_mvsplit_dit", transformer_path)
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module
spec.loader.exec_module(module)

pipe_spec = importlib.util.spec_from_file_location("mvsplit_pipeline", model_dir / "pipeline.py")
pipe_module = importlib.util.module_from_spec(pipe_spec)
sys.modules[pipe_spec.name] = pipe_module
pipe_spec.loader.exec_module(pipe_module)

transformer = module.MVSplitDiTTransformer2DModel.from_pretrained(
    model_dir / "transformer",
    torch_dtype=torch.bfloat16,
    local_files_only=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_dir / "tokenizer", local_files_only=True)
text_encoder = AutoModel.from_pretrained(
    model_dir / "text_encoder",
    torch_dtype=torch.bfloat16,
    local_files_only=True,
)
vae = AutoencoderKLFlux2.from_pretrained(
    model_dir / "vae",
    torch_dtype=torch.bfloat16,
    local_files_only=True,
)

pipe = pipe_module.MVSplitDiTPipeline(
    transformer=transformer,
    vae=vae,
    text_encoder=text_encoder,
    tokenizer=tokenizer,
    time_shift_alpha=4.0,
)
pipe.enable_sequential_cpu_offload()

generator = torch.Generator(device="cpu").manual_seed(42)
image = pipe(
    prompt="a red panda climbing a bamboo stalk",
    height=256,
    width=256,
    num_inference_steps=35,
    guidance_scale=2.0,
    generator=generator,
).images[0]
image.save("demo.png")
```

### Recommended settings

| Parameter | Default | Notes |
| --- | ---: | --- |
| `height` / `width` | 256 | Square output resolution |
| `num_inference_steps` | 35 | Flow-matching Euler steps |
| `guidance_scale` | 2.0 | Classifier-free guidance |
| `time_shift_alpha` | 4.0 | Time-shift in the flow schedule (must match training) |
| `seed` | 42 | Reproducible sampling |

## Citation

```bibtex
@article{lu2026mms,
  title   = {Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers},
  author  = {Lu, Pengqi},
  journal = {arXiv preprint arXiv:2605.06169},
  year    = {2026},
}
```