YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Spatio-temporal Transformer: 1X World Model Compression Challenge
Code for our spatio-temporal Transformer which won the 1X World Model Compression Challenge.
The model has 136M parameters.
Setup
Install environment with
uv sync --all-extras --group gpu
Configure accelerate to use
uv run accelerate config
Download the data
Download the tokenized data with
uv run huggingface-cli download 1x-technologies/worldmodel --repo-type dataset --local-dir data/tokenized
Download the raw data with
uv run huggingface-cli download 1x-technologies/worldmodel_raw_data --repo-type dataset --local-dir data/raw
Download the Cosmos tokenizer
Download the Cosmos tokenizers (Cosmos-0.1-Tokenizer-DV8x8x8 and Cosmos-0.1-Tokenizer-DV8x16x16) with
uv run python download_cosmos_tokenizer.py
Running experiments
The run with test loss of 6.9334 was ran with
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 uv run accelerate launch --config_file accelerate/default_config.yaml src/train.py ++use_wandb=True ++per_device_batch_size=20 ++lr=8e-4 ++grad_accum_steps=1
Note we used an effective batch size of 20x8=160 without any gradient accumulation. If you're struggling for GPU memory you can decrease the per_device_batch_size and increase grad_accum_steps to get the same effective batch size.
We can also use the extra small Transformer with
CUDA_VISIBLE_DEVICES=0 uv run python train.py ++use_wandb=True +model=xsmall
Inference from a checkpoint
Load a checkpoint and run inference with
CUDA_VISIBLE_DEVICES=0,1,2,3 uv run accelerate launch --num_processes 4 src/inference.py ++ckpt_dir="output/hydra/train_accelerate/2025-09-10_11-33-19
Generate submission
Generate a submission with
CUDA_VISIBLE_DEVICES=0 uv run python src/generate_submission_compression.py ++ckpt_dir="output/hydra/train_accelerate/2025-09-10_11-33-19"
It takes a little while to run as it has to save a separate NumPy file for each data point.