MiniT2I-diffusers / README.md
BiliSakura's picture
Upload folder using huggingface_hub
b67e8f3 verified
|
Raw
History Blame Contribute Delete
4.94 kB
---
license: mit
library_name: diffusers
pipeline_tag: text-to-image
tags:
- diffusers
- minit2i
- image-generation
- text-to-image
- flow-matching
- pixel-space
inference: true
widget:
- text: A lonely astronaut standing on a quiet beach under two moons.
output:
url: MiniT2I-B-16/demo.png
language:
- en
---
# BiliSakura/MiniT2I-diffusers
Self-contained MiniT2I text-to-image checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline code, component modules, bundled FLAN-T5-Large text encoder, and transformer weights.
Converted from [`MiniT2I/MiniT2I`](https://huggingface.co/MiniT2I/MiniT2I) using [MiniT2I-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/MiniT2I-diffusers) in [Visual-Generative-Foundation-Model-Collection](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection).
## Available checkpoints
| Subfolder | Model | Params (denoiser + text encoder) | Patch | Recommended CFG |
| --- | --- | --- | ---: | ---: |
| [`MiniT2I-B-16/`](MiniT2I-B-16/) | MiniT2I-B/16 | 258M + 341M | 16 | 2.5 |
| [`MiniT2I-L-16/`](MiniT2I-L-16/) | MiniT2I-L/16 | 912M + 341M | 16 | 6.0 |
## Repo layout
```text
BiliSakura/MiniT2I-diffusers/
β”œβ”€β”€ README.md
β”œβ”€β”€ MiniT2I-B-16/
β”‚ β”œβ”€β”€ pipeline.py
β”‚ β”œβ”€β”€ model_index.json
β”‚ β”œβ”€β”€ conversion_metadata.json
β”‚ β”œβ”€β”€ demo.png
β”‚ β”œβ”€β”€ scheduler/
β”‚ β”‚ └── scheduler_config.json
β”‚ β”œβ”€β”€ text_encoder/
β”‚ β”œβ”€β”€ tokenizer/
β”‚ └── transformer/
β”‚ β”œβ”€β”€ config.json
β”‚ β”œβ”€β”€ diffusion_pytorch_model.safetensors
β”‚ └── transformer_minit2i.py
└── MiniT2I-L-16/
└── ...
```
Each variant is self-contained: load with `custom_pipeline=.../pipeline.py` and `trust_remote_code=True`. MiniT2I denoises directly in RGB pixel space (no VAE).
## Demo
![MiniT2I-B-16 demo](MiniT2I-B-16/demo.png)
Prompt: *"A lonely astronaut standing on a quiet beach under two moons."* β€” MiniT2I-B/16 at 512Γ—512, 100 steps, `guidance_scale=2.5`, seed 42.
## Load from Hugging Face
```python
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"BiliSakura/MiniT2I-diffusers/MiniT2I-B-16",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
).to("cuda")
generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
"A lonely astronaut standing on a quiet beach under two moons.",
num_inference_steps=100,
guidance_scale=2.5,
generator=generator,
).images[0]
image.save("demo.png")
```
For MiniT2I-L/16, use `MiniT2I-L-16` and `guidance_scale=6.0`.
## Load from a local clone
```python
from pathlib import Path
import torch
from diffusers import DiffusionPipeline
model_dir = Path("./MiniT2I-B-16").resolve()
pipe = DiffusionPipeline.from_pretrained(
str(model_dir),
local_files_only=True,
custom_pipeline=str(model_dir / "pipeline.py"),
trust_remote_code=True,
torch_dtype=torch.bfloat16,
).to("cuda")
generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
"A lonely astronaut standing on a quiet beach under two moons.",
num_inference_steps=100,
guidance_scale=2.5,
generator=generator,
).images[0]
image.save("demo.png")
```
Load a **variant subfolder** (e.g. `./MiniT2I-B-16`), not the repo root.
## Recommended inference settings
| Variant | Resolution | Steps | CFG scale | `torch_dtype` |
| --- | --- | ---: | ---: | --- |
| `MiniT2I-B-16` | 512Γ—512 | 100 | 2.5 | `bfloat16` |
| `MiniT2I-L-16` | 512Γ—512 | 100 | 6.0 | `bfloat16` |
For GenEval / DPG-Bench evaluation, upstream configs use `guidance_scale=5.0` for both B/16 and L/16.
## Interface notes
- Text conditioning uses bundled `google/flan-t5-large` (`T5EncoderModel` + `T5Tokenizer`).
- Scheduler is `FlowMatchEulerDiscreteScheduler` with 1000 training timesteps and `shift=1.0`.
- `guidance_scale > 1.0` enables classifier-free guidance with an empty-string null prompt.
- Output resolution is fixed at 512Γ—512 for these exports.
## Regenerate bundles
From the repository root:
```bash
conda activate rsgen
python scripts/convert_minit2i_to_bilisakura.py
```
## Links
- Blog: [MiniT2I: A Minimalist Baseline for Text-to-Image Generation](https://peppaking8.github.io/#/post/minit2i)
- Upstream checkpoints: [MiniT2I/MiniT2I](https://huggingface.co/MiniT2I/MiniT2I)
- PyTorch/Diffusers source: [MiniT2I-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/MiniT2I-diffusers)
## Citation
```bibtex
@misc{minit2i2026,
title = {MiniT2I: A Minimalist Baseline for Text-to-Image Generation},
author = {Wang, Xianbang and Zhao, Hanhong and Lu, Yiyang and Zhou, Kangyang and Ma, Linrui and He, Kaiming},
year = {2026},
url = {https://peppaking8.github.io/#/post/minit2i}
}
```