Pokémon Pixel-Art Diffusion — checkpoints
Trained checkpoints for a from-scratch DDPM that generates 96×96 Pokémon pixel art (by name, by free-text description, or by fusing two Pokémon).
Code, model definitions & demo notebook: https://github.com/weiee666/Pokemon_Combination
Checkpoints
file (experiments/.../ckpt_ep300.pt) |
conditioning | attention | classes | params |
|---|---|---|---|---|
exp01_pokemon1070front_condUNet-CFG |
name embedding | – | 1070 | 4.2M |
exp02_pokemon20aug_condUNet-CFG-TB |
name embedding | – | 20 | 3.9M |
exp03_pokemon20aug_condUNet-CFG-attn-TB |
name embedding | self-attn | 20 | 5.8M |
exp04_pokemon20clean_bigUNet-CFG-attn-EMA |
name embedding | self-attn + EMA | 20 | 30.3M |
exp05_pokemon20aug_bigUNet-CFG-attn-EMA |
name embedding | self-attn + EMA | 20 | 30.3M |
exp06_stage1_pokemonALL_clipText |
CLIP pooled text | – | 988 | 30.5M |
exp07_stage1_pokemonALL_xattn |
CLIP per-token | self + cross-attn @24²/12² | 988 | 33.5M |
exp08_stage1_pokemonALL_xattn48 |
CLIP per-token | self + cross-attn @48²/24²/12² | 988 | 34.0M |
exp09_stage1_pokemonALLcentered_xattn48 |
CLIP per-token | self + cross-attn @48²/24²/12² | 988 | 34.0M |
exp06_.../clip_table.pt = pre-computed CLIP pooled vectors per class (needed by the exp06 model).
The checkpoints store EMA weights (for the experiments that use EMA). CLIP-conditioned
models (exp06–09) expect the CLIP ViT-B-32 (laion2b_s34b_b79k) text encoder.
Load
from huggingface_hub import hf_hub_download
import torch
ckpt = hf_hub_download(
"WEIEE/pokemon-diffusion",
"experiments/exp09_stage1_pokemonALLcentered_xattn48/checkpoints/ckpt_ep300.pt",
)
state = torch.load(ckpt, map_location="cpu")
# the matching UNet / Diffusion class definitions live in the GitHub notebooks
Educational / research project. Pokémon and all sprites are © Nintendo / Game Freak.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support