pokemon-diffusion / README.md
WEIEE's picture
Add model card
69194f6 verified
|
Raw
History Blame Contribute Delete
2.12 kB
metadata
license: cc-by-nc-4.0
tags:
  - diffusion
  - ddpm
  - pixel-art
  - pokemon
  - text-to-image
library_name: pytorch
pipeline_tag: unconditional-image-generation

Pokémon Pixel-Art Diffusion — checkpoints

Trained checkpoints for a from-scratch DDPM that generates 96×96 Pokémon pixel art (by name, by free-text description, or by fusing two Pokémon).

Code, model definitions & demo notebook: https://github.com/weiee666/Pokemon_Combination

Checkpoints

file (experiments/.../ckpt_ep300.pt) conditioning attention classes params
exp01_pokemon1070front_condUNet-CFG name embedding 1070 4.2M
exp02_pokemon20aug_condUNet-CFG-TB name embedding 20 3.9M
exp03_pokemon20aug_condUNet-CFG-attn-TB name embedding self-attn 20 5.8M
exp04_pokemon20clean_bigUNet-CFG-attn-EMA name embedding self-attn + EMA 20 30.3M
exp05_pokemon20aug_bigUNet-CFG-attn-EMA name embedding self-attn + EMA 20 30.3M
exp06_stage1_pokemonALL_clipText CLIP pooled text 988 30.5M
exp07_stage1_pokemonALL_xattn CLIP per-token self + cross-attn @24²/12² 988 33.5M
exp08_stage1_pokemonALL_xattn48 CLIP per-token self + cross-attn @48²/24²/12² 988 34.0M
exp09_stage1_pokemonALLcentered_xattn48 CLIP per-token self + cross-attn @48²/24²/12² 988 34.0M

exp06_.../clip_table.pt = pre-computed CLIP pooled vectors per class (needed by the exp06 model).

The checkpoints store EMA weights (for the experiments that use EMA). CLIP-conditioned models (exp06–09) expect the CLIP ViT-B-32 (laion2b_s34b_b79k) text encoder.

Load

from huggingface_hub import hf_hub_download
import torch

ckpt = hf_hub_download(
    "WEIEE/pokemon-diffusion",
    "experiments/exp09_stage1_pokemonALLcentered_xattn48/checkpoints/ckpt_ep300.pt",
)
state = torch.load(ckpt, map_location="cpu")
# the matching UNet / Diffusion class definitions live in the GitHub notebooks

Educational / research project. Pokémon and all sprites are © Nintendo / Game Freak.