MiniT2I Transformer

The MiniT2I diffusion transformer (MMJiT backbone) as a plain 🤗 transformers model. Loads with trust_remote_code — no diffusers, no MiniT2I install.

import torch
from transformers import AutoModel

transformer = AutoModel.from_pretrained(
    "<user>/minit2i-transformer", trust_remote_code=True, dtype=torch.bfloat16
)

It predicts the clean image from a noised input and conditions on flan-t5-large text embeddings (forward(img, context, attn_mask)). The flow-matching schedule, classifier-free guidance and sampling loop are provided by the minit2i library; pair this with a text encoder repo (e.g. <user>/text_encoder) to generate images.

Citation

@misc{minit2i2026,
  title  = {MiniT2I: A Minimalist Baseline for Text-to-Image Synthesis},
  author = {Wang, Xianbang and Zhao, Hanhong and Lu, Yiyang and Zhou, Kangyang and Ma, Linrui and He, Kaiming},
  year   = {2026},
  url    = {https://peppaking8.github.io/#/post/minit2i}
}

Downloads last month: 22

Safetensors

Model size

0.3B params

Tensor type

F32