Spaces:
Sleeping
Sleeping
File size: 7,854 Bytes
41a9651 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | # Environment
OpenEnv-compatible RL environment wrapping the car racing game. Provides a typed
observation (egocentric headlight image + scalar features), typed action, curriculum
builder, and an Impala CNN encoder ready to plug into a PPO actor-critic.
## Quick Start
```python
from env import CurriculumBuilder, DriveAction
builder = CurriculumBuilder()
# Training loop
env = builder.next_env()
obs = env.reset()
total_reward = 0.0
while not obs.done:
# obs.image : (64, 64, 3) uint8 numpy array
# obs.speed : float 0..1
# obs.on_track: float 1.0 / 0.0
action = DriveAction(accel=1.0, steer=0.0)
obs = env.step(action)
total_reward += obs.reward
advanced = builder.record(total_reward) # auto-advances curriculum when ready
print(builder.status)
```
## File Structure
```
env/
models.py DriveAction and RaceObservation (Pydantic, OpenEnv-compatible)
environment.py RaceEnvironment β server-side wrapper around game.rl_splits.CarEnv
client.py RaceEnvClient β OpenEnv WebSocket client
encoder.py ImpalaCNN + RaceEncoder (PyTorch) for PPO actor-critic
curriculum.py CurriculumBuilder β wraps rl_splits TRAIN/VAL/TEST splits
```
## Observation Space
`RaceObservation` has two parts that feed different network branches:
### Image β `obs.image` β CNN encoder
- Shape: `(64, 64, 3)` uint8
- **Egocentric**: car always faces up, track geometry is heading-invariant
- Rendering pipeline per step:
1. Blit track surface to offscreen canvas
2. Draw headlight cone (60Β° spread, 60 px ahead)
3. Crop 120Γ120 px square centred on car (grass-padded at borders)
4. Rotate so car heading maps to UP
5. Re-crop centre after rotation padding
6. Scale to 64Γ64
### Scalars β `obs.speed / on_track / sin_angle / cos_angle` β MLP encoder
| Field | Range | Purpose |
|-------|-------|---------|
| `speed` | 0..1 | Speed / max_speed. Controls braking decisions. |
| `on_track` | 0 or 1 | Reactive penalty signal. |
| `sin_angle` | β1..1 | Absolute heading orientation. |
| `cos_angle` | β1..1 | Absolute heading orientation. |
**Dropped from original CarEnv obs** (would hurt generalisation to unseen tracks):
| Dropped field | Why |
|--------------|-----|
| `x`, `y` | Absolute screen position β track-specific, causes overfitting |
| `gate_side` | Distance to start/finish gate β meaningless on unseen layouts |
## Action Space
`DriveAction(accel, steer)` β continuous, both clamped to `[β1, 1]` inside `CarEnv.step`.
| Field | Range | Effect |
|-------|-------|--------|
| `accel` | +1 | Full throttle |
| `accel` | β1 | Brake |
| `steer` | +1 | Steer right |
| `steer` | β1 | Steer left |
## Reward Function
Defined in `game/rl_splits.py:CarEnv.step`. Rewards are **not** scaled by
complexity β all values are fixed, keeping episode returns comparable across
tracks in the same rollout buffer. Complexity only scales the curriculum threshold.
| Term | Trigger | Value | Goal |
|------|---------|-------|------|
| Forward pulse | Every step | `+speed/max_speed Γ 0.01` | Prevent stalling |
| Off-track | Every step off road | `β0.5` | Stay on road |
| Crash event | onβoff transition | `β5.0` | Penalise each boundary hit |
| Lap completion | Gate crossed cleanly | `+50 Γ time_ratio Γ dist_ratio` | Fast + efficient path |
| Out of bounds | Terminal | `β100` | Don't leave screen |
**Lap completion bonus:**
```
time_ratio = clamp(par_time_steps / actual_lap_steps, 0.5, 2.0)
dist_ratio = clamp(optimal_dist / actual_lap_dist, 0.5, 1.0)
```
`dist_ratio` is capped at **1.0** β no bonus for going shorter than the track
centreline (that implies off-track corner cutting). `lap_dist` only accumulates
while `on_track=True`, closing the exploit where brief grass-cutting reduced
path length and inflated `dist_ratio`.
| Performance | time_ratio | dist_ratio | Lap bonus |
|------------|-----------|-----------|-----------|
| Faster than par, tight line | 2.0 | 1.0 | +100 |
| On-par, centreline path | 1.0 | 1.0 | +50 |
| Slow, meandering | 0.5 | 0.5 | +12.5 |
**Complexity scales the curriculum threshold, not the reward:**
```
effective_threshold = base_threshold Γ track.complexity
```
Track 16 (C=3.45) requires a window mean of `30 Γ 3.45 = 104` to advance β
meaning consistently good laps β while Track 1 (C=1.0) only needs 30.
Because rewards themselves are unscaled, value-function targets stay in the
same range regardless of which track the agent is currently on.
## Encoder
`RaceEncoder` fuses both observation branches into a single feature vector for PPO:
```
image (64Γ64Γ3)
βββΊ ImpalaCNN β 256-d
βββΊ cat β 288-d β Actor / Critic heads
scalars (4,) β
βββΊ MLP 4β32β32 β 32-d
```
```python
import torch
from env import RaceEncoder
encoder = RaceEncoder() # out_features = 288
img = torch.zeros(4, 3, 64, 64) # batch of 4, normalised 0..1
scalars = torch.zeros(4, 4)
features = encoder(img, scalars) # (4, 288)
```
### ImpalaCNN vs Nature CNN
| | Nature CNN (DQN) | ImpalaCNN (IMPALA) |
|---|---|---|
| Architecture | 3 plain conv layers | 3 blocks Γ (Conv + MaxPool + 2 ResBlocks) |
| Skip connections | None | Yes β `x = x + residual(x)` in each block |
| Gradient flow | Vanishes in early layers | Direct path back through shortcuts |
| Sample efficiency | Baseline | ~3β5Γ better on visual RL tasks |
| Inference cost | Fast | Same (equivalent depth) |
## Curriculum Builder
Based on the 16-track split in `game/rl_splits.py`:
| Split | Tracks | Purpose |
|-------|--------|---------|
| TRAIN | 1,2, 5,6, 9,10, 13,14 | 2 per tier, curriculum ordered easyβhard |
| VAL | 3, 7, 11, 15 | 1 per tier β performance gating, never trained on |
| TEST | 4, 8, 12, 16 | 1 per tier β held-out, final evaluation only |
```python
from env import CurriculumBuilder
builder = CurriculumBuilder(
threshold=30.0, # mean reward needed to advance (same value works all tracks due to complexity scaling)
window=50, # rolling window size β advance only after 50 consecutive episodes exceed threshold
# too small (e.g. 5) β advances on lucky streaks, policy not stable yet
# too large (e.g. 500) β stays on mastered track too long, slows curriculum
replay_frac=0.3, # 30% of episodes replay mastered tracks (prevents forgetting)
use_image=True, # set False to skip image rendering (fast unit tests / ablations)
)
env = builder.next_env() # samples frontier (or replay) track
builder.record(episode_reward) # auto-advances when threshold met
for env in builder.val_envs(): # evaluate on held-out VAL tracks
...
print(builder.status) # "Frontier: track 2 'Standard Oval' [2/8] ..."
print(builder.is_complete) # True when all TRAIN tracks mastered
```
## OpenEnv Client (Remote Server)
To run the environment as a server and connect from a remote training process:
```python
# server β start with: openenv serve env.environment:RaceEnvironment
# client
from env import RaceEnvClient, DriveAction
async with RaceEnvClient(base_url="http://localhost:8000") as client:
result = await client.reset()
result = await client.step(DriveAction(accel=1.0, steer=0.0))
# or synchronously
with RaceEnvClient(base_url="http://localhost:8000").sync() as client:
result = client.reset()
result = client.step(DriveAction(accel=1.0, steer=0.0))
```
## Headless Mode (parallel training)
Set these env vars before importing pygame to run without a display:
```python
import os
os.environ["SDL_VIDEODRIVER"] = "dummy"
os.environ["SDL_AUDIODRIVER"] = "dummy"
```
`RaceEnvironment` renders entirely to offscreen `pygame.Surface` objects, so no
display is needed at any point.
|