Instructions to use BiliSakura/MiniT2I-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use BiliSakura/MiniT2I-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("BiliSakura/MiniT2I-diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "A lonely astronaut standing on a quiet beach under two moons." image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
| license: mit | |
| library_name: diffusers | |
| pipeline_tag: text-to-image | |
| tags: | |
| - diffusers | |
| - minit2i | |
| - image-generation | |
| - text-to-image | |
| - flow-matching | |
| - pixel-space | |
| inference: true | |
| widget: | |
| - text: A lonely astronaut standing on a quiet beach under two moons. | |
| output: | |
| url: MiniT2I-B-16/demo.png | |
| language: | |
| - en | |
| # BiliSakura/MiniT2I-diffusers | |
| Self-contained MiniT2I text-to-image checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline code, component modules, bundled FLAN-T5-Large text encoder, and transformer weights. | |
| Converted from [`MiniT2I/MiniT2I`](https://huggingface.co/MiniT2I/MiniT2I) using [MiniT2I-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/MiniT2I-diffusers) in [Visual-Generative-Foundation-Model-Collection](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection). | |
| ## Available checkpoints | |
| | Subfolder | Model | Params (denoiser + text encoder) | Patch | Recommended CFG | | |
| | --- | --- | --- | ---: | ---: | | |
| | [`MiniT2I-B-16/`](MiniT2I-B-16/) | MiniT2I-B/16 | 258M + 341M | 16 | 2.5 | | |
| | [`MiniT2I-L-16/`](MiniT2I-L-16/) | MiniT2I-L/16 | 912M + 341M | 16 | 6.0 | | |
| ## Repo layout | |
| ```text | |
| BiliSakura/MiniT2I-diffusers/ | |
| βββ README.md | |
| βββ MiniT2I-B-16/ | |
| β βββ pipeline.py | |
| β βββ model_index.json | |
| β βββ conversion_metadata.json | |
| β βββ demo.png | |
| β βββ scheduler/ | |
| β β βββ scheduler_config.json | |
| β βββ text_encoder/ | |
| β βββ tokenizer/ | |
| β βββ transformer/ | |
| β βββ config.json | |
| β βββ diffusion_pytorch_model.safetensors | |
| β βββ transformer_minit2i.py | |
| βββ MiniT2I-L-16/ | |
| βββ ... | |
| ``` | |
| Each variant is self-contained: load with `custom_pipeline=.../pipeline.py` and `trust_remote_code=True`. MiniT2I denoises directly in RGB pixel space (no VAE). | |
| ## Demo | |
|  | |
| Prompt: *"A lonely astronaut standing on a quiet beach under two moons."* β MiniT2I-B/16 at 512Γ512, 100 steps, `guidance_scale=2.5`, seed 42. | |
| ## Load from Hugging Face | |
| ```python | |
| import torch | |
| from diffusers import DiffusionPipeline | |
| pipe = DiffusionPipeline.from_pretrained( | |
| "BiliSakura/MiniT2I-diffusers/MiniT2I-B-16", | |
| trust_remote_code=True, | |
| torch_dtype=torch.bfloat16, | |
| ).to("cuda") | |
| generator = torch.Generator(device="cuda").manual_seed(42) | |
| image = pipe( | |
| "A lonely astronaut standing on a quiet beach under two moons.", | |
| num_inference_steps=100, | |
| guidance_scale=2.5, | |
| generator=generator, | |
| ).images[0] | |
| image.save("demo.png") | |
| ``` | |
| For MiniT2I-L/16, use `MiniT2I-L-16` and `guidance_scale=6.0`. | |
| ## Load from a local clone | |
| ```python | |
| from pathlib import Path | |
| import torch | |
| from diffusers import DiffusionPipeline | |
| model_dir = Path("./MiniT2I-B-16").resolve() | |
| pipe = DiffusionPipeline.from_pretrained( | |
| str(model_dir), | |
| local_files_only=True, | |
| custom_pipeline=str(model_dir / "pipeline.py"), | |
| trust_remote_code=True, | |
| torch_dtype=torch.bfloat16, | |
| ).to("cuda") | |
| generator = torch.Generator(device="cuda").manual_seed(42) | |
| image = pipe( | |
| "A lonely astronaut standing on a quiet beach under two moons.", | |
| num_inference_steps=100, | |
| guidance_scale=2.5, | |
| generator=generator, | |
| ).images[0] | |
| image.save("demo.png") | |
| ``` | |
| Load a **variant subfolder** (e.g. `./MiniT2I-B-16`), not the repo root. | |
| ## Recommended inference settings | |
| | Variant | Resolution | Steps | CFG scale | `torch_dtype` | | |
| | --- | --- | ---: | ---: | --- | | |
| | `MiniT2I-B-16` | 512Γ512 | 100 | 2.5 | `bfloat16` | | |
| | `MiniT2I-L-16` | 512Γ512 | 100 | 6.0 | `bfloat16` | | |
| For GenEval / DPG-Bench evaluation, upstream configs use `guidance_scale=5.0` for both B/16 and L/16. | |
| ## Interface notes | |
| - Text conditioning uses bundled `google/flan-t5-large` (`T5EncoderModel` + `T5Tokenizer`). | |
| - Scheduler is `FlowMatchEulerDiscreteScheduler` with 1000 training timesteps and `shift=1.0`. | |
| - `guidance_scale > 1.0` enables classifier-free guidance with an empty-string null prompt. | |
| - Output resolution is fixed at 512Γ512 for these exports. | |
| ## Regenerate bundles | |
| From the repository root: | |
| ```bash | |
| conda activate rsgen | |
| python scripts/convert_minit2i_to_bilisakura.py | |
| ``` | |
| ## Links | |
| - Blog: [MiniT2I: A Minimalist Baseline for Text-to-Image Generation](https://peppaking8.github.io/#/post/minit2i) | |
| - Upstream checkpoints: [MiniT2I/MiniT2I](https://huggingface.co/MiniT2I/MiniT2I) | |
| - PyTorch/Diffusers source: [MiniT2I-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/MiniT2I-diffusers) | |
| ## Citation | |
| ```bibtex | |
| @misc{minit2i2026, | |
| title = {MiniT2I: A Minimalist Baseline for Text-to-Image Generation}, | |
| author = {Wang, Xianbang and Zhao, Hanhong and Lu, Yiyang and Zhou, Kangyang and Ma, Linrui and He, Kaiming}, | |
| year = {2026}, | |
| url = {https://peppaking8.github.io/#/post/minit2i} | |
| } | |
| ``` | |