Upload folder using huggingface_hub

b67e8f3 verified 4 days ago

4.94 kB

	---
	license: mit
	library_name: diffusers
	pipeline_tag: text-to-image
	tags:
	- diffusers
	- minit2i
	- image-generation
	- text-to-image
	- flow-matching
	- pixel-space
	inference: true
	widget:
	- text: A lonely astronaut standing on a quiet beach under two moons.
	output:
	url: MiniT2I-B-16/demo.png
	language:
	- en
	---

	# BiliSakura/MiniT2I-diffusers

	Self-contained MiniT2I text-to-image checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline code, component modules, bundled FLAN-T5-Large text encoder, and transformer weights.

	Converted from [`MiniT2I/MiniT2I`](https://huggingface.co/MiniT2I/MiniT2I) using [MiniT2I-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/MiniT2I-diffusers) in [Visual-Generative-Foundation-Model-Collection](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection).

	## Available checkpoints

	\| Subfolder \| Model \| Params (denoiser + text encoder) \| Patch \| Recommended CFG \|
	\| --- \| --- \| --- \| ---: \| ---: \|
	\| [`MiniT2I-B-16/`](MiniT2I-B-16/) \| MiniT2I-B/16 \| 258M + 341M \| 16 \| 2.5 \|
	\| [`MiniT2I-L-16/`](MiniT2I-L-16/) \| MiniT2I-L/16 \| 912M + 341M \| 16 \| 6.0 \|

	## Repo layout

	```text
	BiliSakura/MiniT2I-diffusers/
	├── README.md
	├── MiniT2I-B-16/
	│ ├── pipeline.py
	│ ├── model_index.json
	│ ├── conversion_metadata.json
	│ ├── demo.png
	│ ├── scheduler/
	│ │ └── scheduler_config.json
	│ ├── text_encoder/
	│ ├── tokenizer/
	│ └── transformer/
	│ ├── config.json
	│ ├── diffusion_pytorch_model.safetensors
	│ └── transformer_minit2i.py
	└── MiniT2I-L-16/
	└── ...
	```

	Each variant is self-contained: load with `custom_pipeline=.../pipeline.py` and `trust_remote_code=True`. MiniT2I denoises directly in RGB pixel space (no VAE).

	## Demo

	![MiniT2I-B-16 demo](MiniT2I-B-16/demo.png)

	Prompt: "A lonely astronaut standing on a quiet beach under two moons." — MiniT2I-B/16 at 512×512, 100 steps, `guidance_scale=2.5`, seed 42.

	## Load from Hugging Face

	```python
	import torch
	from diffusers import DiffusionPipeline

	pipe = DiffusionPipeline.from_pretrained(
	"BiliSakura/MiniT2I-diffusers/MiniT2I-B-16",
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	).to("cuda")

	generator = torch.Generator(device="cuda").manual_seed(42)
	image = pipe(
	"A lonely astronaut standing on a quiet beach under two moons.",
	num_inference_steps=100,
	guidance_scale=2.5,
	generator=generator,
	).images[0]
	image.save("demo.png")
	```

	For MiniT2I-L/16, use `MiniT2I-L-16` and `guidance_scale=6.0`.

	## Load from a local clone

	```python
	from pathlib import Path
	import torch
	from diffusers import DiffusionPipeline

	model_dir = Path("./MiniT2I-B-16").resolve()
	pipe = DiffusionPipeline.from_pretrained(
	str(model_dir),
	local_files_only=True,
	custom_pipeline=str(model_dir / "pipeline.py"),
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	).to("cuda")

	generator = torch.Generator(device="cuda").manual_seed(42)
	image = pipe(
	"A lonely astronaut standing on a quiet beach under two moons.",
	num_inference_steps=100,
	guidance_scale=2.5,
	generator=generator,
	).images[0]
	image.save("demo.png")
	```

	Load a variant subfolder (e.g. `./MiniT2I-B-16`), not the repo root.

	## Recommended inference settings

	\| Variant \| Resolution \| Steps \| CFG scale \| `torch_dtype` \|
	\| --- \| --- \| ---: \| ---: \| --- \|
	\| `MiniT2I-B-16` \| 512×512 \| 100 \| 2.5 \| `bfloat16` \|
	\| `MiniT2I-L-16` \| 512×512 \| 100 \| 6.0 \| `bfloat16` \|

	For GenEval / DPG-Bench evaluation, upstream configs use `guidance_scale=5.0` for both B/16 and L/16.

	## Interface notes

	- Text conditioning uses bundled `google/flan-t5-large` (`T5EncoderModel` + `T5Tokenizer`).
	- Scheduler is `FlowMatchEulerDiscreteScheduler` with 1000 training timesteps and `shift=1.0`.
	- `guidance_scale > 1.0` enables classifier-free guidance with an empty-string null prompt.
	- Output resolution is fixed at 512×512 for these exports.

	## Regenerate bundles

	From the repository root:

	```bash
	conda activate rsgen
	python scripts/convert_minit2i_to_bilisakura.py
	```

	## Links

	- Blog: [MiniT2I: A Minimalist Baseline for Text-to-Image Generation](https://peppaking8.github.io/#/post/minit2i)
	- Upstream checkpoints: [MiniT2I/MiniT2I](https://huggingface.co/MiniT2I/MiniT2I)
	- PyTorch/Diffusers source: [MiniT2I-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/MiniT2I-diffusers)

	## Citation

	```bibtex
	@misc{minit2i2026,
	title = {MiniT2I: A Minimalist Baseline for Text-to-Image Generation},
	author = {Wang, Xianbang and Zhao, Hanhong and Lu, Yiyang and Zhou, Kangyang and Ma, Linrui and He, Kaiming},
	year = {2026},
	url = {https://peppaking8.github.io/#/post/minit2i}
	}
	```