pratik220704's picture
Add README.md
891f85e verified
metadata
license: apache-2.0
tags:
  - diffusion
  - unconditional-image-generation
  - ddpm
  - diffusers
  - yi-script
library_name: diffusers
pipeline_tag: unconditional-image-generation

Yi Syllable Diffusion

An unconditional DDPM that generates images of Yi script syllables (Unicode block U+A000U+A48C). Trained on 1,165 glyphs rendered from the NotoSansYi-Regular font.

denoising animation quality vs inference steps

Left: reverse diffusion (noise → glyph). Right: the same glyph sharpening as the number of inference steps increases.

Sample output

real vs generated

Top: real glyphs (font). Bottom: generated by this model.

Usage

from diffusers import DDPMPipeline
pipe = DDPMPipeline.from_pretrained("pratik220704/yi-syllable-diffusion").to("cuda")
image = pipe(num_inference_steps=50).images[0]
image.save("yi.png")

Training data

1,165 grayscale 64×64 PNGs, one per Yi syllable, rendered with PIL from NotoSansYi-Regular.ttf.

Training procedure

  • Architecture: UNet2DModel (diffusers), 1-channel in/out, ~17 M params.
  • Noise schedule: cosine-beta DDPM (1000 steps) with zero terminal SNR.
  • Objective: v-prediction.
  • Sampler: DDIMScheduler, timestep_spacing="trailing", clip_sample=True, 50 steps.
  • Optimizer: AdamW, lr 1e-4, cosine LR warmup. Epochs: 10.

The zero-SNR + v-prediction recipe is what produces crisp black-on-white glyphs (plain epsilon-prediction yields a grey haze). FID (full dataset) ≈ 108.6.

Limitations

Unconditional — you cannot request a specific syllable. Quality is bounded by the 64 px resolution and short (10-epoch) training budget.

License

Model weights: Apache-2.0. The Noto fonts are licensed under the SIL Open Font License.