Finding the Time to Think in Real-Time RL β€” Checkpoints

Pretrained base planners and gating policies for the paper Finding the Time to Think in Real-Time RL.

A lightweight gating policy on top of a frozen AlphaZero-style MCTS planner selects a state-dependent planning budget at each decision point, across five real-time games (Pac-Man, real-time Tetris, Snake, Speed Hex, Speed Go).

Layout

checkpoints/
β”œβ”€β”€ clock/{go,hex}/{base,gating}          # Speed Go / Speed Hex (pgx)
└── committed_action/{pacman,snake,tetris_rt}/{base,gating}   # Jumanji

One AlphaZero base planner + one PPO gating policy per environment. See the code repo's README for the launcher scripts that consume these.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading