Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.14.0
Game
Pygame-based curriculum car racer used as the simulation backend for RL training.
Running
# From project root
uv run python main.py # start at track 1
uv run python main.py 5 # start at track 5
# As a module
uv run python -m game.curriculum_game 5
Controls
| Key | Action |
|---|---|
| Arrow keys | Drive (up=throttle, down=brake, left/right=steer) |
| N / P | Next / previous track |
| 1 β 9 | Jump to track number |
| R | Restart (counts as an attempt) |
| ESC | Quit |
Tracks
16 tracks across 4 difficulty tiers. Each level narrows the road, tightens turns, or increases speed cap.
| Tier | Levels | Shape | Description |
|---|---|---|---|
| A β Easy | 1β4 | Full ellipses | Wide to narrow ovals, superspeedway |
| B β Medium-Easy | 5β8 | Rounded rectangles | Stadium oval, tight rectangle |
| C β Medium-Hard | 9β12 | Two-arc layouts | Hairpin, chicane, double-hairpin, asymmetric |
| D β Hard | 13β16 | Polygon circuits | L-shape, T-notch, complex circuit, master challenge |
Game Rules
- Complete one full lap without touching the white fence border.
- Touching the fence or pressing R = restart from start, attempt counter +1.
- Cross the start/finish line cleanly to finish. A summary screen shows stats.
HUD
A single top bar shows: track name Β· speed Β· attempt count Β· lap time Β· total time Β· distance Β· max speed. Timer starts on first key press, not on load.
File Structure
game/
oval_racer.py Original single-oval game. Exports draw_headlights, draw_car,
SCREEN_W, SCREEN_H used by curriculum_game and env/.
tracks.py 16 TrackDef objects. Each knows its waypoints, road width,
start position/angle, speed cap, and on-track mask.
curriculum_game.py Main playable game. RaceState drives the lap logic,
reset-on-crash, finish detection, and HUD rendering.
rl_splits.py CarEnv (gym-style wrapper), CurriculumSampler, Evaluator,
and TRAIN / VAL / TEST splits for RL training.
test_tracks.py Headless test: builds all 16 tracks and simulates 150 steps
each. Run with: uv run python -m game.test_tracks
Physics Constants
| Constant | Value | Effect |
|---|---|---|
ACCEL |
0.13 | Throttle acceleration per frame |
BRAKE_DECEL |
0.22 | Braking deceleration per frame |
FRICTION |
0.038 | Passive speed decay per frame |
STEER_DEG |
2.7 | Degrees rotated per steer step |
max_speed |
3.0β4.5 | Per-track speed cap (px/frame) |
Speed is in px/frame. Multiply by FPS (60) to get px/s shown in HUD.
Finish Line Detection
Two-phase gate crossing to handle fast cars reliably:
- Arm β wait until
gate_side > 50 pxahead (car is clearly past the gate going forward). - Trigger β detect
prev_side < 0andcurr_side >= 0withspeed > 0.3.
This prevents the car from triggering on spawn or when reversing back over the line.
Track Metadata (used by reward)
Each TrackDef computes three values at construction time:
| Field | Formula | Purpose |
|---|---|---|
optimal_dist |
Waypoint polygon perimeter (px) | Theoretical shortest lap path |
par_time_steps |
optimal_dist / (max_speed Γ 0.7) |
Expected lap frames at 70% speed |
complexity |
(115 / width) Γ (max_speed / 3.0) |
Difficulty multiplier (1.0 β 3.45) |
RL Interface (rl_splits.py)
CarEnv exposes a gym-style API:
from game.rl_splits import make_env, TRAIN
env = make_env(TRAIN[0])
obs = env.reset() # [x/W, y/H, sin, cos, speed/max, on_track, gate_side]
obs, reward, done, info = env.step([accel, steer])
# info keys: lap, on_track, step, crashes, lap_dist, out_of_bounds
Reward Function
Rewards are not scaled by complexity β all values are fixed and comparable
across every track. Complexity only scales the curriculum threshold.
| Term | Trigger | Value | Purpose |
|---|---|---|---|
| Forward pulse | Every step | +speed/max_speed Γ 0.01 |
Prevent stalling |
| Off-track | Every step off road | β0.5 |
Stay on road |
| Crash event | onβoff transition | β5.0 |
Penalise each boundary hit |
| Lap completion | Gate crossed cleanly | +50 Γ time_ratio Γ dist_ratio |
Fast + efficient path |
| Out of bounds | Terminal | β100 |
Don't leave screen |
Lap completion breakdown:
time_ratio = clamp(par_time_steps / actual_lap_steps, 0.5, 2.0)
dist_ratio = clamp(optimal_dist / actual_lap_dist, 0.5, 1.0)
dist_ratiocapped at 1.0 β no bonus for paths shorter than the centreline (any such path involves off-track corner cutting).lap_distis only accumulated whileon_track=True, closing the corner-cutting exploit.- Best lap:
50 Γ 2.0 Γ 1.0 = 100 - Worst completed lap:
50 Γ 0.5 Γ 0.5 = 12.5
Curriculum threshold scales with complexity, rewards do not:
effective_threshold = base_threshold Γ track.complexity
| Track | C | Effective threshold (base=30) |
|---|---|---|
| 1 β Wide Oval | 1.00 | 30 |
| 8 β Small Oval | 2.03 | 61 |
| 14 β T-Notch | 2.66 | 80 |
| 16 β Master Challenge | 3.45 | 104 |