Genie World Model - Pong
This repository contains trained checkpoints for the Genie world model, trained on the Pong game.
Models
| Model | Description | Size |
|---|---|---|
| Video Tokenizer | ST-ViViT encoder/decoder with 512-code VQ codebook | ~454 MB |
| Latent Action Model (LAM) | Learns 3-action discrete space (up/down/nothing) | ~6.2 GB |
| Dynamics Model | MaskGIT-style next-frame predictor | ~647 MB |
Usage
import torch
from huggingface_hub import hf_hub_download
# Download tokenizer
tokenizer_path = hf_hub_download(
repo_id="sangramrout/genie-pong",
filename="checkpoints/tokenizer/checkpoint_step_2288.pt"
)
# Load checkpoint
checkpoint = torch.load(tokenizer_path, map_location="cpu")
Training
See the GitHub repository for training code and details.
Architecture
- Video Tokenizer: 6-layer encoder, 8-layer decoder, 384 d_model
- LAM: 20-layer encoder/decoder (paper hyperparameters), 1024 d_model, 3-action codebook
- Dynamics: 8-layer transformer, 640 d_model, MaskGIT training