Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.14.0
Environment
OpenEnv-compatible RL environment wrapping the car racing game. Provides a typed observation (egocentric headlight image + scalar features), typed action, curriculum builder, and an Impala CNN encoder ready to plug into a PPO actor-critic.
Quick Start
from env import CurriculumBuilder, DriveAction
builder = CurriculumBuilder()
# Training loop
env = builder.next_env()
obs = env.reset()
total_reward = 0.0
while not obs.done:
# obs.image : (64, 64, 3) uint8 numpy array
# obs.speed : float 0..1
# obs.on_track: float 1.0 / 0.0
action = DriveAction(accel=1.0, steer=0.0)
obs = env.step(action)
total_reward += obs.reward
advanced = builder.record(total_reward) # auto-advances curriculum when ready
print(builder.status)
File Structure
env/
models.py DriveAction and RaceObservation (Pydantic, OpenEnv-compatible)
environment.py RaceEnvironment β server-side wrapper around game.rl_splits.CarEnv
client.py RaceEnvClient β OpenEnv WebSocket client
encoder.py ImpalaCNN + RaceEncoder (PyTorch) for PPO actor-critic
curriculum.py CurriculumBuilder β wraps rl_splits TRAIN/VAL/TEST splits
Observation Space
RaceObservation has two parts that feed different network branches:
Image β obs.image β CNN encoder
- Shape:
(64, 64, 3)uint8 - Egocentric: car always faces up, track geometry is heading-invariant
- Rendering pipeline per step:
- Blit track surface to offscreen canvas
- Draw headlight cone (60Β° spread, 60 px ahead)
- Crop 120Γ120 px square centred on car (grass-padded at borders)
- Rotate so car heading maps to UP
- Re-crop centre after rotation padding
- Scale to 64Γ64
Scalars β obs.speed / on_track / sin_angle / cos_angle β MLP encoder
| Field | Range | Purpose |
|---|---|---|
speed |
0..1 | Speed / max_speed. Controls braking decisions. |
on_track |
0 or 1 | Reactive penalty signal. |
sin_angle |
β1..1 | Absolute heading orientation. |
cos_angle |
β1..1 | Absolute heading orientation. |
Dropped from original CarEnv obs (would hurt generalisation to unseen tracks):
| Dropped field | Why |
|---|---|
x, y |
Absolute screen position β track-specific, causes overfitting |
gate_side |
Distance to start/finish gate β meaningless on unseen layouts |
Action Space
DriveAction(accel, steer) β continuous, both clamped to [β1, 1] inside CarEnv.step.
| Field | Range | Effect |
|---|---|---|
accel |
+1 | Full throttle |
accel |
β1 | Brake |
steer |
+1 | Steer right |
steer |
β1 | Steer left |
Reward Function
Defined in game/rl_splits.py:CarEnv.step. Rewards are not scaled by
complexity β all values are fixed, keeping episode returns comparable across
tracks in the same rollout buffer. Complexity only scales the curriculum threshold.
| Term | Trigger | Value | Goal |
|---|---|---|---|
| Forward pulse | Every step | +speed/max_speed Γ 0.01 |
Prevent stalling |
| Off-track | Every step off road | β0.5 |
Stay on road |
| Crash event | onβoff transition | β5.0 |
Penalise each boundary hit |
| Lap completion | Gate crossed cleanly | +50 Γ time_ratio Γ dist_ratio |
Fast + efficient path |
| Out of bounds | Terminal | β100 |
Don't leave screen |
Lap completion bonus:
time_ratio = clamp(par_time_steps / actual_lap_steps, 0.5, 2.0)
dist_ratio = clamp(optimal_dist / actual_lap_dist, 0.5, 1.0)
dist_ratio is capped at 1.0 β no bonus for going shorter than the track
centreline (that implies off-track corner cutting). lap_dist only accumulates
while on_track=True, closing the exploit where brief grass-cutting reduced
path length and inflated dist_ratio.
| Performance | time_ratio | dist_ratio | Lap bonus |
|---|---|---|---|
| Faster than par, tight line | 2.0 | 1.0 | +100 |
| On-par, centreline path | 1.0 | 1.0 | +50 |
| Slow, meandering | 0.5 | 0.5 | +12.5 |
Complexity scales the curriculum threshold, not the reward:
effective_threshold = base_threshold Γ track.complexity
Track 16 (C=3.45) requires a window mean of 30 Γ 3.45 = 104 to advance β
meaning consistently good laps β while Track 1 (C=1.0) only needs 30.
Because rewards themselves are unscaled, value-function targets stay in the
same range regardless of which track the agent is currently on.
Encoder
RaceEncoder fuses both observation branches into a single feature vector for PPO:
image (64Γ64Γ3)
βββΊ ImpalaCNN β 256-d
βββΊ cat β 288-d β Actor / Critic heads
scalars (4,) β
βββΊ MLP 4β32β32 β 32-d
import torch
from env import RaceEncoder
encoder = RaceEncoder() # out_features = 288
img = torch.zeros(4, 3, 64, 64) # batch of 4, normalised 0..1
scalars = torch.zeros(4, 4)
features = encoder(img, scalars) # (4, 288)
ImpalaCNN vs Nature CNN
| Nature CNN (DQN) | ImpalaCNN (IMPALA) | |
|---|---|---|
| Architecture | 3 plain conv layers | 3 blocks Γ (Conv + MaxPool + 2 ResBlocks) |
| Skip connections | None | Yes β x = x + residual(x) in each block |
| Gradient flow | Vanishes in early layers | Direct path back through shortcuts |
| Sample efficiency | Baseline | ~3β5Γ better on visual RL tasks |
| Inference cost | Fast | Same (equivalent depth) |
Curriculum Builder
Based on the 16-track split in game/rl_splits.py:
| Split | Tracks | Purpose |
|---|---|---|
| TRAIN | 1,2, 5,6, 9,10, 13,14 | 2 per tier, curriculum ordered easyβhard |
| VAL | 3, 7, 11, 15 | 1 per tier β performance gating, never trained on |
| TEST | 4, 8, 12, 16 | 1 per tier β held-out, final evaluation only |
from env import CurriculumBuilder
builder = CurriculumBuilder(
threshold=30.0, # mean reward needed to advance (same value works all tracks due to complexity scaling)
window=50, # rolling window size β advance only after 50 consecutive episodes exceed threshold
# too small (e.g. 5) β advances on lucky streaks, policy not stable yet
# too large (e.g. 500) β stays on mastered track too long, slows curriculum
replay_frac=0.3, # 30% of episodes replay mastered tracks (prevents forgetting)
use_image=True, # set False to skip image rendering (fast unit tests / ablations)
)
env = builder.next_env() # samples frontier (or replay) track
builder.record(episode_reward) # auto-advances when threshold met
for env in builder.val_envs(): # evaluate on held-out VAL tracks
...
print(builder.status) # "Frontier: track 2 'Standard Oval' [2/8] ..."
print(builder.is_complete) # True when all TRAIN tracks mastered
OpenEnv Client (Remote Server)
To run the environment as a server and connect from a remote training process:
# server β start with: openenv serve env.environment:RaceEnvironment
# client
from env import RaceEnvClient, DriveAction
async with RaceEnvClient(base_url="http://localhost:8000") as client:
result = await client.reset()
result = await client.step(DriveAction(accel=1.0, steer=0.0))
# or synchronously
with RaceEnvClient(base_url="http://localhost:8000").sync() as client:
result = client.reset()
result = client.step(DriveAction(accel=1.0, steer=0.0))
Headless Mode (parallel training)
Set these env vars before importing pygame to run without a display:
import os
os.environ["SDL_VIDEODRIVER"] = "dummy"
os.environ["SDL_AUDIODRIVER"] = "dummy"
RaceEnvironment renders entirely to offscreen pygame.Surface objects, so no
display is needed at any point.