Genie World Model - Pong

This repository contains trained checkpoints for the Genie world model, trained on the Pong game.

Models

Model	Description	Size
Video Tokenizer	ST-ViViT encoder/decoder with 512-code VQ codebook	~454 MB
Latent Action Model (LAM)	Learns 3-action discrete space (up/down/nothing)	~6.2 GB
Dynamics Model	MaskGIT-style next-frame predictor	~647 MB

Usage

import torch
from huggingface_hub import hf_hub_download

# Download tokenizer
tokenizer_path = hf_hub_download(
    repo_id="sangramrout/genie-pong",
    filename="checkpoints/tokenizer/checkpoint_step_2288.pt"
)

# Load checkpoint
checkpoint = torch.load(tokenizer_path, map_location="cpu")

Training

See the GitHub repository for training code and details.

Architecture

Video Tokenizer: 6-layer encoder, 8-layer decoder, 384 d_model
LAM: 20-layer encoder/decoder (paper hyperparameters), 1024 d_model, 3-action codebook
Dynamics: 8-layer transformer, 640 d_model, MaskGIT training

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning