Add model card

69194f6 verified 6 days ago

2.12 kB

license: cc-by-nc-4.0
tags:
  - diffusion
  - ddpm
  - pixel-art
  - pokemon
  - text-to-image
library_name: pytorch
pipeline_tag: unconditional-image-generation

Pokémon Pixel-Art Diffusion — checkpoints

Trained checkpoints for a from-scratch DDPM that generates 96×96 Pokémon pixel art (by name, by free-text description, or by fusing two Pokémon).

Code, model definitions & demo notebook: https://github.com/weiee666/Pokemon_Combination

Checkpoints

file (`experiments/.../ckpt_ep300.pt`)	conditioning	attention	classes	params
`exp01_pokemon1070front_condUNet-CFG`	name embedding	–	1070	4.2M
`exp02_pokemon20aug_condUNet-CFG-TB`	name embedding	–	20	3.9M
`exp03_pokemon20aug_condUNet-CFG-attn-TB`	name embedding	self-attn	20	5.8M
`exp04_pokemon20clean_bigUNet-CFG-attn-EMA`	name embedding	self-attn + EMA	20	30.3M
`exp05_pokemon20aug_bigUNet-CFG-attn-EMA`	name embedding	self-attn + EMA	20	30.3M
`exp06_stage1_pokemonALL_clipText`	CLIP pooled text	–	988	30.5M
`exp07_stage1_pokemonALL_xattn`	CLIP per-token	self + cross-attn @24²/12²	988	33.5M
`exp08_stage1_pokemonALL_xattn48`	CLIP per-token	self + cross-attn @48²/24²/12²	988	34.0M
`exp09_stage1_pokemonALLcentered_xattn48`	CLIP per-token	self + cross-attn @48²/24²/12²	988	34.0M

exp06_.../clip_table.pt = pre-computed CLIP pooled vectors per class (needed by the exp06 model).

The checkpoints store EMA weights (for the experiments that use EMA). CLIP-conditioned models (exp06–09) expect the CLIP ViT-B-32 (laion2b_s34b_b79k) text encoder.

Load

from huggingface_hub import hf_hub_download
import torch

ckpt = hf_hub_download(
    "WEIEE/pokemon-diffusion",
    "experiments/exp09_stage1_pokemonALLcentered_xattn48/checkpoints/ckpt_ep300.pt",
)
state = torch.load(ckpt, map_location="cpu")
# the matching UNet / Diffusion class definitions live in the GitHub notebooks