ZYI-0.2 β€” Lightweight Text-to-Image Diffusion Model

ZYI-0.2 is a lightweight text-to-image diffusion model (~67M parameters) designed to generate small images from natural language prompts.

The model is optimized for educational use, experimentation, and running on low-VRAM GPUs.

It uses:

  • PyTorch
  • CLIP text encoder
  • UNet diffusion backbone
  • DDIM fast sampling

Image resolution:

128 Γ— 128

Example Generations

Text-to-Image Examples

example1

example2

example3

example4

example5

Model Details

Model name: ZYI-0.2 Parameters: ~67M Architecture: Text-conditioned UNet diffusion Text encoder: CLIP (ViT-B/32) Training dataset: COCO-style captions dataset Framework: PyTorch


How to Use

Download the repository:

from huggingface_hub import snapshot_download
import sys

path = snapshot_download("caikybaldo999/ZYI-0.2")

sys.path.append(path)

Then generate an image:

from inference import generate
from IPython.display import display

img = generate("a group of people at the beach")
display(img)

Installation

Install dependencies:

pip install torch transformers numpy Pillow tqdm

Inference Speed

Using DDIM sampling:

Steps Time
1000 (DDPM) slow
30 (DDIM) fast

Typical generation time:

~1–3 seconds on GPU

Repository Structure

ZYI-0.2/
β”‚
β”œβ”€β”€ model.pt
β”œβ”€β”€ model.py
β”œβ”€β”€ ddim_sampler.py
β”œβ”€β”€ inference.py
β”œβ”€β”€ requirements.txt
└── README.md

Limitations

  • Low resolution (128x128)
  • Not trained on large-scale datasets
  • May produce artifacts

This model is mainly intended for learning and experimentation.


License

This project is released for research and educational use.


Author

Created by:

Caiky Baldo

Hugging Face https://huggingface.co/caikybaldo999

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train caikybaldo999/ZYI-0.2