Pokémon Pixel-Art Diffusion — checkpoints

Trained checkpoints for a from-scratch DDPM that generates 96×96 Pokémon pixel art (by name, by free-text description, or by fusing two Pokémon).

Code, model definitions & demo notebook: https://github.com/weiee666/Pokemon_Combination

Checkpoints

file (experiments/.../ckpt_ep300.pt) conditioning attention classes params
exp01_pokemon1070front_condUNet-CFG name embedding 1070 4.2M
exp02_pokemon20aug_condUNet-CFG-TB name embedding 20 3.9M
exp03_pokemon20aug_condUNet-CFG-attn-TB name embedding self-attn 20 5.8M
exp04_pokemon20clean_bigUNet-CFG-attn-EMA name embedding self-attn + EMA 20 30.3M
exp05_pokemon20aug_bigUNet-CFG-attn-EMA name embedding self-attn + EMA 20 30.3M
exp06_stage1_pokemonALL_clipText CLIP pooled text 988 30.5M
exp07_stage1_pokemonALL_xattn CLIP per-token self + cross-attn @24²/12² 988 33.5M
exp08_stage1_pokemonALL_xattn48 CLIP per-token self + cross-attn @48²/24²/12² 988 34.0M
exp09_stage1_pokemonALLcentered_xattn48 CLIP per-token self + cross-attn @48²/24²/12² 988 34.0M

exp06_.../clip_table.pt = pre-computed CLIP pooled vectors per class (needed by the exp06 model).

The checkpoints store EMA weights (for the experiments that use EMA). CLIP-conditioned models (exp06–09) expect the CLIP ViT-B-32 (laion2b_s34b_b79k) text encoder.

Load

from huggingface_hub import hf_hub_download
import torch

ckpt = hf_hub_download(
    "WEIEE/pokemon-diffusion",
    "experiments/exp09_stage1_pokemonALLcentered_xattn48/checkpoints/ckpt_ep300.pt",
)
state = torch.load(ckpt, map_location="cpu")
# the matching UNet / Diffusion class definitions live in the GitHub notebooks

Educational / research project. Pokémon and all sprites are © Nintendo / Game Freak.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support