Genie World Model - Pong

This repository contains trained checkpoints for the Genie world model, trained on the Pong game.

Models

Model Description Size
Video Tokenizer ST-ViViT encoder/decoder with 512-code VQ codebook ~454 MB
Latent Action Model (LAM) Learns 3-action discrete space (up/down/nothing) ~6.2 GB
Dynamics Model MaskGIT-style next-frame predictor ~647 MB

Usage

import torch
from huggingface_hub import hf_hub_download

# Download tokenizer
tokenizer_path = hf_hub_download(
    repo_id="sangramrout/genie-pong",
    filename="checkpoints/tokenizer/checkpoint_step_2288.pt"
)

# Load checkpoint
checkpoint = torch.load(tokenizer_path, map_location="cpu")

Training

See the GitHub repository for training code and details.

Architecture

  • Video Tokenizer: 6-layer encoder, 8-layer decoder, 384 d_model
  • LAM: 20-layer encoder/decoder (paper hyperparameters), 1024 d_model, 3-action codebook
  • Dynamics: 8-layer transformer, 640 d_model, MaskGIT training
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading