Spaces:

Leeps
/

sequence-playground

Paused

App Files Files Community

sequence-playground / README.md

Leeps

Add learned VQ tokenizer option

ed8852d 13 days ago

preview code

Raw

History Blame Contribute Delete

2.05 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

metadata

title: Autoregressive Image Token Playground
emoji: 🧩
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.22.0
python_version: 3.11
app_file: app.py
pinned: false
license: mit

Autoregressive Image Token Playground

A CPU-friendly Gradio app for teaching image tokenization and image generation as sequence problems.

The first tab takes a real image and shows image tokenization two ways:

a pretrained learned VQ tokenizer from CompVis/ldm-celebahq-256/vqvae
a transparent k-means patch tokenizer for comparison

The learned tokenizer shows real learned codebook IDs, reconstruction from the VQ decoder, token usage, and representative image regions for the most-used codes. The k-means option can learn a tiny codebook from one image or from all loaded MoMA images.

The app also includes a deliberately small, transparent image-token sampler. It does not call a proprietary image model. Instead, it shows the mechanics that matter for a workshop:

an image is represented as a grid of discrete codebook tokens
generation follows a fixed order, one token at a time
each next token is sampled from visible logits
logits are split into prompt, position, and previous-token context terms
students can inspect every step, token probability table, and final token inventory

The visible token tiles are abstract swatches, not source images being pasted into the output. They stand in for learned image-token codes in real autoregressive image systems.

The Codebook tab gives students a compact sketch of how image patches become token IDs, how the autoregressive model predicts those IDs, and how IDs decode back into visible patches.

This pairs well with a diffusion demo because students can compare two different views of generation:

diffusion gradually denoises a whole latent image
autoregressive generation fills in discrete image tokens one by one

Running

Install the requirements and run:

python app.py

The app is intentionally lightweight: it uses gradio, numpy, and pillow.