--- license: mit library_name: diffusers pipeline_tag: text-to-image tags: - diffusers - minit2i - image-generation - text-to-image - flow-matching - pixel-space inference: true widget: - text: A lonely astronaut standing on a quiet beach under two moons. output: url: MiniT2I-B-16/demo.png language: - en --- # BiliSakura/MiniT2I-diffusers Self-contained MiniT2I text-to-image checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline code, component modules, bundled FLAN-T5-Large text encoder, and transformer weights. Converted from [`MiniT2I/MiniT2I`](https://huggingface.co/MiniT2I/MiniT2I) using [MiniT2I-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/MiniT2I-diffusers) in [Visual-Generative-Foundation-Model-Collection](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection). ## Available checkpoints | Subfolder | Model | Params (denoiser + text encoder) | Patch | Recommended CFG | | --- | --- | --- | ---: | ---: | | [`MiniT2I-B-16/`](MiniT2I-B-16/) | MiniT2I-B/16 | 258M + 341M | 16 | 2.5 | | [`MiniT2I-L-16/`](MiniT2I-L-16/) | MiniT2I-L/16 | 912M + 341M | 16 | 6.0 | ## Repo layout ```text BiliSakura/MiniT2I-diffusers/ ├── README.md ├── MiniT2I-B-16/ │ ├── pipeline.py │ ├── model_index.json │ ├── conversion_metadata.json │ ├── demo.png │ ├── scheduler/ │ │ └── scheduler_config.json │ ├── text_encoder/ │ ├── tokenizer/ │ └── transformer/ │ ├── config.json │ ├── diffusion_pytorch_model.safetensors │ └── transformer_minit2i.py └── MiniT2I-L-16/ └── ... ``` Each variant is self-contained: load with `custom_pipeline=.../pipeline.py` and `trust_remote_code=True`. MiniT2I denoises directly in RGB pixel space (no VAE). ## Demo ![MiniT2I-B-16 demo](MiniT2I-B-16/demo.png) Prompt: *"A lonely astronaut standing on a quiet beach under two moons."* — MiniT2I-B/16 at 512×512, 100 steps, `guidance_scale=2.5`, seed 42. ## Load from Hugging Face ```python import torch from diffusers import DiffusionPipeline pipe = DiffusionPipeline.from_pretrained( "BiliSakura/MiniT2I-diffusers/MiniT2I-B-16", trust_remote_code=True, torch_dtype=torch.bfloat16, ).to("cuda") generator = torch.Generator(device="cuda").manual_seed(42) image = pipe( "A lonely astronaut standing on a quiet beach under two moons.", num_inference_steps=100, guidance_scale=2.5, generator=generator, ).images[0] image.save("demo.png") ``` For MiniT2I-L/16, use `MiniT2I-L-16` and `guidance_scale=6.0`. ## Load from a local clone ```python from pathlib import Path import torch from diffusers import DiffusionPipeline model_dir = Path("./MiniT2I-B-16").resolve() pipe = DiffusionPipeline.from_pretrained( str(model_dir), local_files_only=True, custom_pipeline=str(model_dir / "pipeline.py"), trust_remote_code=True, torch_dtype=torch.bfloat16, ).to("cuda") generator = torch.Generator(device="cuda").manual_seed(42) image = pipe( "A lonely astronaut standing on a quiet beach under two moons.", num_inference_steps=100, guidance_scale=2.5, generator=generator, ).images[0] image.save("demo.png") ``` Load a **variant subfolder** (e.g. `./MiniT2I-B-16`), not the repo root. ## Recommended inference settings | Variant | Resolution | Steps | CFG scale | `torch_dtype` | | --- | --- | ---: | ---: | --- | | `MiniT2I-B-16` | 512×512 | 100 | 2.5 | `bfloat16` | | `MiniT2I-L-16` | 512×512 | 100 | 6.0 | `bfloat16` | For GenEval / DPG-Bench evaluation, upstream configs use `guidance_scale=5.0` for both B/16 and L/16. ## Interface notes - Text conditioning uses bundled `google/flan-t5-large` (`T5EncoderModel` + `T5Tokenizer`). - Scheduler is `FlowMatchEulerDiscreteScheduler` with 1000 training timesteps and `shift=1.0`. - `guidance_scale > 1.0` enables classifier-free guidance with an empty-string null prompt. - Output resolution is fixed at 512×512 for these exports. ## Regenerate bundles From the repository root: ```bash conda activate rsgen python scripts/convert_minit2i_to_bilisakura.py ``` ## Links - Blog: [MiniT2I: A Minimalist Baseline for Text-to-Image Generation](https://peppaking8.github.io/#/post/minit2i) - Upstream checkpoints: [MiniT2I/MiniT2I](https://huggingface.co/MiniT2I/MiniT2I) - PyTorch/Diffusers source: [MiniT2I-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/MiniT2I-diffusers) ## Citation ```bibtex @misc{minit2i2026, title = {MiniT2I: A Minimalist Baseline for Text-to-Image Generation}, author = {Wang, Xianbang and Zhao, Hanhong and Lu, Yiyang and Zhou, Kangyang and Ma, Linrui and He, Kaiming}, year = {2026}, url = {https://peppaking8.github.io/#/post/minit2i} } ```