File size: 10,460 Bytes
3d1c0e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
# Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

<div align="center">
<img width="1421" alt="Meissonic Banner" src="https://github.com/user-attachments/assets/703f6882-163a-42d0-8da8-3680231ca75e">

[![arXiv](https://img.shields.io/badge/arXiv-2410.08261-b31b1b.svg)](https://arxiv.org/abs/2410.08261)
[![Hugging Face](https://img.shields.io/badge/๐Ÿค—%20Huggingface-Model_Meissonic-yellow)](https://huggingface.co/MeissonFlow/Meissonic)
[![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github)](https://github.com/viiika/Meissonic)
[![YouTube](https://img.shields.io/badge/YouTube-Tutorial_EN-FF0000?logo=youtube)](https://www.youtube.com/watch?v=PlmifElhr6M)
[![YouTube](https://img.shields.io/badge/YouTube-Tutorial_JA-FF0000?logo=youtube)](https://www.youtube.com/watch?v=rJDrf49wF64)
[![Demo](https://img.shields.io/badge/Live-Demo_Meissonic-blue?logo=huggingface)](https://huggingface.co/spaces/MeissonFlow/meissonic)
[![Replicate](https://replicate.com/chenxwh/meissonic/badge)](https://replicate.com/chenxwh/meissonic)

[![Hugging Face](https://img.shields.io/badge/๐Ÿค—%20Huggingface-Model_Monetico-yellow)](https://huggingface.co/Collov-Labs/Monetico)
[![Demo](https://img.shields.io/badge/Live-Demo_Monetico-blue?logo=huggingface)](https://huggingface.co/spaces/Collov-Labs/Monetico)

[![arXiv](https://img.shields.io/badge/arXiv-2411.10781-b31b1b.svg)](https://arxiv.org/abs/2411.10781)

[![arXiv](https://img.shields.io/badge/arXiv-2503.15457-b31b1b.svg)](https://arxiv.org/abs/2503.15457)
[![Hugging Face](https://img.shields.io/badge/๐Ÿค—%20Huggingface-Model_DiMO-yellow)](https://huggingface.co/Yuanzhi/DiMO)
[![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github)](https://github.com/yuanzhi-zhu/DiMO)


[![arXiv](https://img.shields.io/badge/arXiv-2505.23606-b31b1b.svg)](https://arxiv.org/abs/2505.23606)
[![Hugging Face](https://img.shields.io/badge/๐Ÿค—%20Huggingface-Model_Muddit-yellow)](https://huggingface.co/MeissonFlow/Muddit)
[![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github)](https://github.com/M-E-AGI-Lab/Muddit)
[![Demo](https://img.shields.io/badge/Live-Demo_Muddit-blue?logo=huggingface)](https://huggingface.co/spaces/MeissonFlow/muddit)

[![arXiv](https://img.shields.io/badge/arXiv-2507.04947-b31b1b.svg)](https://arxiv.org/abs/2507.04947)

[![arXiv](https://img.shields.io/badge/arXiv-2508.10684-b31b1b.svg)](https://arxiv.org/abs/2508.10684)

[![arXiv](https://img.shields.io/badge/arXiv-2509.19244-b31b1b.svg)](https://arxiv.org/abs/2509.19244)
[![arXiv](https://img.shields.io/badge/arXiv-2509.23919-b31b1b.svg)](https://arxiv.org/abs/2509.23919)
[![arXiv](https://img.shields.io/badge/arXiv-2509.25171-b31b1b.svg)](https://arxiv.org/abs/2509.25171)

[![arXiv](https://img.shields.io/badge/arXiv-2510.06308-b31b1b.svg)](https://arxiv.org/abs/2510.06308)

[![arXiv](https://img.shields.io/badge/arXiv-2510.20668-b31b1b.svg)](https://arxiv.org/abs/2510.20668) [![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github)](https://github.com/M-E-AGI-Lab/Awesome-World-Models)

</div>

## ๐Ÿ“ Meissonic Updates and Family Papers

- [MaskGIT: Masked Generative Image Transformer](https://arxiv.org/abs/2202.04200) [CVPR 2022]
- [Muse: Text-To-Image Generation via Masked Generative Transformers](https://arxiv.org/abs/2301.00704) [ICML 2023]
- [๐ŸŒŸ][Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis](https://arxiv.org/abs/2410.08261) [ICLR 2025]
- [Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer](https://arxiv.org/abs/2411.10781)
- [Di[๐™ผ]O: Distilling Masked Diffusion Models into One-step Generator](https://arxiv.org/abs/2503.15457) [ICCV 2025]
- [๐ŸŒŸ][Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model](https://arxiv.org/abs/2505.23606)
- [DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer](https://arxiv.org/pdf/2507.04947) [ICCV 2025]
- [MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control](https://arxiv.org/abs/2508.10684)
- [Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation](https://arxiv.org/abs/2509.19244)
- [๐ŸŒŸ][Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding](https://arxiv.org/abs/2510.06308)
- [Token Painter: Training-Free Text-Guided Image Inpainting via Mask Autoregressive Models](https://arxiv.org/abs/2509.23919)
- [TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion](https://arxiv.org/abs/2509.25171)
- [OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows](https://arxiv.org/abs/2510.03506)
- [Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces](https://arxiv.org/abs/2506.07903) [ICML 2025]
- [Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy](https://arxiv.org/abs/2510.09012) [NeurIPS 2025]
- [๐ŸŒŸ][From Masks to Worlds: A Hitchhiker's Guide to World Models](https://arxiv.org/abs/2510.20668)
- [Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings](https://arxiv.org/abs/2509.22925)

- More papers are coming soon!
See [MeissonFlow Research](https://huggingface.co/MeissonFlow) (Organization Card) for more about our vision.


![Meissonic Demos](./assets/demos.png)

## ๐Ÿš€ Introduction

Meissonic is a non-autoregressive mask image modeling text-to-image synthesis model that can generate high-resolution images. It is designed to run on consumer graphics cards.

![Architecture](./assets/architecture.png)

**Key Features:**
- ๐Ÿ–ผ๏ธ High-resolution image generation (up to 1024x1024)
- ๐Ÿ’ป Designed to run on consumer GPUs
- ๐ŸŽจ Versatile applications: text-to-image, image-to-image

## ๐Ÿ› ๏ธ Prerequisites

### Step 1: Clone the repository
```bash
git clone https://github.com/viiika/Meissonic/
cd Meissonic
```

### Step 2: Create virtual environment
```bash
conda create --name meissonic python
conda activate meissonic
pip install -r requirements.txt
```

### Step 3: Install diffusers
```bash
git clone https://github.com/huggingface/diffusers.git
cd diffusers
pip install -e .
```

## ๐Ÿ’ก Inference Usage

### Gradio Web UI

```bash
python app.py
```

### Command-line Interface

#### Text-to-Image Generation

```bash
python inference.py --prompt "Your creative prompt here"
```

#### Inpainting and Outpainting

```bash
python inpaint.py --mode inpaint --input_image path/to/image.jpg
python inpaint.py --mode outpaint --input_image path/to/image.jpg
```

### Advanced: FP8 Quantization

Optimize performance with FP8 quantization:

Requirements:
- CUDA 12.4
- PyTorch 2.4.1
- TorchAO

Note: Windows users install TorchAO using
```shell
pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cpu
```

Command-line inference
```shell
python inference_fp8.py --quantization fp8
```

Gradio for FP8 (Select Quantization Method in Advanced settings)
```shell
python app_fp8.py
```

#### Performance Benchmarks

| Precision (Steps=64, Resolution=1024x1024) | Batch Size=1 (Avg. Time) | Memory Usage |
|-------------------------------------------|--------------------------|--------------|
| FP32                                      | 13.32s                   | 12GB         |
| FP16                                      | 12.35s                   | 9.5GB        |
| FP8                                       | 12.93s                   | 8.7GB        |

## ๐ŸŽจ Showcase

<div align="center">
  <img src="https://github.com/user-attachments/assets/b30a7912-5453-48ba-aff4-bfb547bbe626" width="320" alt="A pillow with a picture of a Husky on it.">
  <p><i>"A pillow with a picture of a Husky on it."</i></p>
</div>

<div align="center">
  <img src="https://github.com/user-attachments/assets/b23a1603-399d-40d6-8e16-c077d3d12a08" width="320" alt="A white coffee mug, a solid black background">
  <p><i>"A white coffee mug, a solid black background"</i></p>
</div>

## ๐ŸŽ“ Training

To train Meissonic, follow these steps:

1. Install dependencies:
   ```bash
   cd train
   pip install -r requirements.txt
   ```

2. Download the [Meissonic](https://huggingface.co/MeissonFlow/Meissonic) base model from Hugging Face.

3. Prepare your dataset:
   - Use the sample dataset: [MeissonFlow/splash](https://huggingface.co/datasets/MeissonFlow/lemon/resolve/main/0000.parquet)
   - Or prepare your own dataset and dataset class following the format in line 100 in [dataset_utils.py](./train/dataset_utils.py) and line 656-680 in [train_meissonic.py](./train/train_meissonic.py)
   - Modify [train.sh](./train/train.sh) with your dataset path

4. Start training:
   ```bash
   bash train/train.sh
   ```

Note: For custom datasets, you'll likely need to implement your own dataset class.


## ๐Ÿ“š Citation

If you find this work helpful, please consider citing:

```bibtex
@article{bai2024meissonic,
  title={Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis},
  author={Bai, Jinbin and Ye, Tian and Chow, Wei and Song, Enxin and Chen, Qing-Guo and Li, Xiangtai and Dong, Zhen and Zhu, Lei and Yan, Shuicheng},
  journal={arXiv preprint arXiv:2410.08261},
  year={2024}
}
```

## ๐Ÿ™ Acknowledgements

We thank the community and contributors for their invaluable support in developing Meissonic. We thank apolinario@multimodal.art for making Meissonic [Demo](https://huggingface.co/spaces/MeissonFlow/meissonic). We thank @NewGenAI and @้ฃ›้ทนใ—ใšใ‹@่‡ช็งฐๆ–‡็ณปใƒ—ใƒญใ‚ฐใƒฉใƒžใฎๅ‹‰ๅผท for making YouTube tutorials. We thank @pprp for making fp8 and int4 quantization. We thank @camenduru for making [jupyter tutorial](https://github.com/camenduru/Meissonic-jupyter). We thank @chenxwh for making Replicate demo and api. We thank Collov Labs for reproducing [Monetico](https://huggingface.co/Collov-Labs/Monetico). We thank [Shitong et al.](https://arxiv.org/abs/2411.10781) for identifying effective design choices for enhancing visual quality.


---

<p align="center">
  <a href="https://star-history.com/#viiika/Meissonic&Date">
    <img src="https://api.star-history.com/svg?repos=viiika/Meissonic&type=Date" alt="Star History Chart">
  </a>
</p>

<p align="center">
  Made with โค๏ธ by the MeissonFlow Research
</p>