---
license: apache-2.0
tags:
  - diffusion
  - unconditional-image-generation
  - ddpm
  - diffusers
  - yi-script
library_name: diffusers
pipeline_tag: unconditional-image-generation
---

# Yi Syllable Diffusion

An unconditional **DDPM** that generates images of **Yi script syllables**
(Unicode block `U+A000`–`U+A48C`). Trained on 1,165 glyphs rendered from the
`NotoSansYi-Regular` font.

<p align="center">
  <img src="diffusion_process.gif" width="200" alt="denoising animation">
  <img src="diffusion_steps.gif" width="200" alt="quality vs inference steps">
</p>

Left: reverse diffusion (noise → glyph). Right: the same glyph sharpening as the
number of inference steps increases.

## Sample output

![real vs generated](real_vs_generated.png)

Top: real glyphs (font). Bottom: generated by this model.

## Usage

```python
from diffusers import DDPMPipeline
pipe = DDPMPipeline.from_pretrained("pratik220704/yi-syllable-diffusion").to("cuda")
image = pipe(num_inference_steps=50).images[0]
image.save("yi.png")
```

## Training data
1,165 grayscale 64×64 PNGs, one per Yi syllable, rendered with PIL from
`NotoSansYi-Regular.ttf`.

## Training procedure
- Architecture: `UNet2DModel` (diffusers), 1-channel in/out, ~17 M params.
- Noise schedule: cosine-beta DDPM (1000 steps) with **zero terminal SNR**.
- Objective: **v-prediction**.
- Sampler: `DDIMScheduler`, `timestep_spacing="trailing"`, `clip_sample=True`, 50 steps.
- Optimizer: AdamW, lr 1e-4, cosine LR warmup. Epochs: 10.

The zero-SNR + v-prediction recipe is what produces crisp black-on-white glyphs
(plain epsilon-prediction yields a grey haze). FID (full dataset) ≈ 108.6.

## Limitations
Unconditional — you cannot request a specific syllable. Quality is bounded by the
64 px resolution and short (10-epoch) training budget.

## License
Model weights: Apache-2.0. The Noto fonts are licensed under the SIL Open Font License.