File size: 4,944 Bytes

b67e8f3

---
license: mit
library_name: diffusers
pipeline_tag: text-to-image
tags:
  - diffusers
  - minit2i
  - image-generation
  - text-to-image
  - flow-matching
  - pixel-space
inference: true
widget:
  - text: A lonely astronaut standing on a quiet beach under two moons.
    output:
      url: MiniT2I-B-16/demo.png
language:
  - en
---

# BiliSakura/MiniT2I-diffusers

Self-contained MiniT2I text-to-image checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline code, component modules, bundled FLAN-T5-Large text encoder, and transformer weights.

Converted from [`MiniT2I/MiniT2I`](https://huggingface.co/MiniT2I/MiniT2I) using [MiniT2I-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/MiniT2I-diffusers) in [Visual-Generative-Foundation-Model-Collection](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection).

## Available checkpoints

| Subfolder | Model | Params (denoiser + text encoder) | Patch | Recommended CFG |
| --- | --- | --- | ---: | ---: |
| [`MiniT2I-B-16/`](MiniT2I-B-16/) | MiniT2I-B/16 | 258M + 341M | 16 | 2.5 |
| [`MiniT2I-L-16/`](MiniT2I-L-16/) | MiniT2I-L/16 | 912M + 341M | 16 | 6.0 |

## Repo layout

```text
BiliSakura/MiniT2I-diffusers/
├── README.md
├── MiniT2I-B-16/
│   ├── pipeline.py
│   ├── model_index.json
│   ├── conversion_metadata.json
│   ├── demo.png
│   ├── scheduler/
│   │   └── scheduler_config.json
│   ├── text_encoder/
│   ├── tokenizer/
│   └── transformer/
│       ├── config.json
│       ├── diffusion_pytorch_model.safetensors
│       └── transformer_minit2i.py
└── MiniT2I-L-16/
    └── ...
```

Each variant is self-contained: load with `custom_pipeline=.../pipeline.py` and `trust_remote_code=True`. MiniT2I denoises directly in RGB pixel space (no VAE).

## Demo

![MiniT2I-B-16 demo](MiniT2I-B-16/demo.png)

Prompt: *"A lonely astronaut standing on a quiet beach under two moons."* — MiniT2I-B/16 at 512×512, 100 steps, `guidance_scale=2.5`, seed 42.

## Load from Hugging Face

```python
import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/MiniT2I-diffusers/MiniT2I-B-16",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).to("cuda")

generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
    "A lonely astronaut standing on a quiet beach under two moons.",
    num_inference_steps=100,
    guidance_scale=2.5,
    generator=generator,
).images[0]
image.save("demo.png")
```

For MiniT2I-L/16, use `MiniT2I-L-16` and `guidance_scale=6.0`.

## Load from a local clone

```python
from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./MiniT2I-B-16").resolve()
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).to("cuda")

generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
    "A lonely astronaut standing on a quiet beach under two moons.",
    num_inference_steps=100,
    guidance_scale=2.5,
    generator=generator,
).images[0]
image.save("demo.png")
```

Load a **variant subfolder** (e.g. `./MiniT2I-B-16`), not the repo root.

## Recommended inference settings

| Variant | Resolution | Steps | CFG scale | `torch_dtype` |
| --- | --- | ---: | ---: | --- |
| `MiniT2I-B-16` | 512×512 | 100 | 2.5 | `bfloat16` |
| `MiniT2I-L-16` | 512×512 | 100 | 6.0 | `bfloat16` |

For GenEval / DPG-Bench evaluation, upstream configs use `guidance_scale=5.0` for both B/16 and L/16.

## Interface notes

- Text conditioning uses bundled `google/flan-t5-large` (`T5EncoderModel` + `T5Tokenizer`).
- Scheduler is `FlowMatchEulerDiscreteScheduler` with 1000 training timesteps and `shift=1.0`.
- `guidance_scale > 1.0` enables classifier-free guidance with an empty-string null prompt.
- Output resolution is fixed at 512×512 for these exports.

## Regenerate bundles

From the repository root:

```bash
conda activate rsgen
python scripts/convert_minit2i_to_bilisakura.py
```

## Links

- Blog: [MiniT2I: A Minimalist Baseline for Text-to-Image Generation](https://peppaking8.github.io/#/post/minit2i)
- Upstream checkpoints: [MiniT2I/MiniT2I](https://huggingface.co/MiniT2I/MiniT2I)
- PyTorch/Diffusers source: [MiniT2I-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/MiniT2I-diffusers)

## Citation

```bibtex
@misc{minit2i2026,
  title  = {MiniT2I: A Minimalist Baseline for Text-to-Image Generation},
  author = {Wang, Xianbang and Zhao, Hanhong and Lu, Yiyang and Zhou, Kangyang and Ma, Linrui and He, Kaiming},
  year   = {2026},
  url    = {https://peppaking8.github.io/#/post/minit2i}
}
```