Spaces:

nirmalpratheep
/

Car-Racing-Agent

Sleeping

App Files Files Community

Car-Racing-Agent / env /README.md

nirmalpratheep

Upload 11 files

41a9651 verified 30 days ago

preview code

raw

history blame contribute delete

7.85 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Environment

OpenEnv-compatible RL environment wrapping the car racing game. Provides a typed observation (egocentric headlight image + scalar features), typed action, curriculum builder, and an Impala CNN encoder ready to plug into a PPO actor-critic.

Quick Start

from env import CurriculumBuilder, DriveAction

builder = CurriculumBuilder()

# Training loop
env = builder.next_env()
obs = env.reset()

total_reward = 0.0
while not obs.done:
    # obs.image   : (64, 64, 3) uint8 numpy array
    # obs.speed   : float  0..1
    # obs.on_track: float  1.0 / 0.0
    action = DriveAction(accel=1.0, steer=0.0)
    obs = env.step(action)
    total_reward += obs.reward

advanced = builder.record(total_reward)   # auto-advances curriculum when ready
print(builder.status)

File Structure

env/
  models.py       DriveAction and RaceObservation (Pydantic, OpenEnv-compatible)
  environment.py  RaceEnvironment — server-side wrapper around game.rl_splits.CarEnv
  client.py       RaceEnvClient  — OpenEnv WebSocket client
  encoder.py      ImpalaCNN + RaceEncoder (PyTorch) for PPO actor-critic
  curriculum.py   CurriculumBuilder — wraps rl_splits TRAIN/VAL/TEST splits

Observation Space

RaceObservation has two parts that feed different network branches:

Image — `obs.image` → CNN encoder

Shape: (64, 64, 3) uint8
Egocentric: car always faces up, track geometry is heading-invariant
Rendering pipeline per step:
1. Blit track surface to offscreen canvas
2. Draw headlight cone (60° spread, 60 px ahead)
3. Crop 120×120 px square centred on car (grass-padded at borders)
4. Rotate so car heading maps to UP
5. Re-crop centre after rotation padding
6. Scale to 64×64

Scalars — `obs.speed / on_track / sin_angle / cos_angle` → MLP encoder

Field	Range	Purpose
`speed`	0..1	Speed / max_speed. Controls braking decisions.
`on_track`	0 or 1	Reactive penalty signal.
`sin_angle`	−1..1	Absolute heading orientation.
`cos_angle`	−1..1	Absolute heading orientation.

Dropped from original CarEnv obs (would hurt generalisation to unseen tracks):

Dropped field	Why
`x`, `y`	Absolute screen position — track-specific, causes overfitting
`gate_side`	Distance to start/finish gate — meaningless on unseen layouts

Action Space

DriveAction(accel, steer) — continuous, both clamped to [−1, 1] inside CarEnv.step.

Field	Range	Effect
`accel`	+1	Full throttle
`accel`	−1	Brake
`steer`	+1	Steer right
`steer`	−1	Steer left

Reward Function

Defined in game/rl_splits.py:CarEnv.step. Rewards are not scaled by complexity — all values are fixed, keeping episode returns comparable across tracks in the same rollout buffer. Complexity only scales the curriculum threshold.

Term	Trigger	Value	Goal
Forward pulse	Every step	`+speed/max_speed × 0.01`	Prevent stalling
Off-track	Every step off road	`−0.5`	Stay on road
Crash event	on→off transition	`−5.0`	Penalise each boundary hit
Lap completion	Gate crossed cleanly	`+50 × time_ratio × dist_ratio`	Fast + efficient path
Out of bounds	Terminal	`−100`	Don't leave screen

Lap completion bonus:

time_ratio = clamp(par_time_steps / actual_lap_steps,  0.5, 2.0)
dist_ratio = clamp(optimal_dist   / actual_lap_dist,   0.5, 1.0)

dist_ratio is capped at 1.0 — no bonus for going shorter than the track centreline (that implies off-track corner cutting). lap_dist only accumulates while on_track=True, closing the exploit where brief grass-cutting reduced path length and inflated dist_ratio.

Performance	time_ratio	dist_ratio	Lap bonus
Faster than par, tight line	2.0	1.0	+100
On-par, centreline path	1.0	1.0	+50
Slow, meandering	0.5	0.5	+12.5

Complexity scales the curriculum threshold, not the reward:

effective_threshold = base_threshold × track.complexity

Track 16 (C=3.45) requires a window mean of 30 × 3.45 = 104 to advance — meaning consistently good laps — while Track 1 (C=1.0) only needs 30. Because rewards themselves are unscaled, value-function targets stay in the same range regardless of which track the agent is currently on.

Encoder

RaceEncoder fuses both observation branches into a single feature vector for PPO:

image (64×64×3)
  └─► ImpalaCNN  →  256-d
                         ├─► cat  →  288-d  →  Actor / Critic heads
scalars (4,)              │
  └─► MLP 4→32→32  →  32-d

import torch
from env import RaceEncoder

encoder = RaceEncoder()           # out_features = 288
img     = torch.zeros(4, 3, 64, 64)   # batch of 4, normalised 0..1
scalars = torch.zeros(4, 4)
features = encoder(img, scalars)  # (4, 288)

ImpalaCNN vs Nature CNN

	Nature CNN (DQN)	ImpalaCNN (IMPALA)
Architecture	3 plain conv layers	3 blocks × (Conv + MaxPool + 2 ResBlocks)
Skip connections	None	Yes — `x = x + residual(x)` in each block
Gradient flow	Vanishes in early layers	Direct path back through shortcuts
Sample efficiency	Baseline	~3–5× better on visual RL tasks
Inference cost	Fast	Same (equivalent depth)

Curriculum Builder

Based on the 16-track split in game/rl_splits.py:

Split	Tracks	Purpose
TRAIN	1,2, 5,6, 9,10, 13,14	2 per tier, curriculum ordered easy→hard
VAL	3, 7, 11, 15	1 per tier — performance gating, never trained on
TEST	4, 8, 12, 16	1 per tier — held-out, final evaluation only

from env import CurriculumBuilder

builder = CurriculumBuilder(
    threshold=30.0,  # mean reward needed to advance (same value works all tracks due to complexity scaling)
    window=50,       # rolling window size — advance only after 50 consecutive episodes exceed threshold
                     # too small (e.g. 5)  → advances on lucky streaks, policy not stable yet
                     # too large (e.g. 500) → stays on mastered track too long, slows curriculum
    replay_frac=0.3, # 30% of episodes replay mastered tracks (prevents forgetting)
    use_image=True,  # set False to skip image rendering (fast unit tests / ablations)
)

env = builder.next_env()          # samples frontier (or replay) track
builder.record(episode_reward)    # auto-advances when threshold met

for env in builder.val_envs():    # evaluate on held-out VAL tracks
    ...

print(builder.status)             # "Frontier: track 2 'Standard Oval' [2/8] ..."
print(builder.is_complete)        # True when all TRAIN tracks mastered

OpenEnv Client (Remote Server)

To run the environment as a server and connect from a remote training process:

# server — start with: openenv serve env.environment:RaceEnvironment
# client
from env import RaceEnvClient, DriveAction

async with RaceEnvClient(base_url="http://localhost:8000") as client:
    result = await client.reset()
    result = await client.step(DriveAction(accel=1.0, steer=0.0))

# or synchronously
with RaceEnvClient(base_url="http://localhost:8000").sync() as client:
    result = client.reset()
    result = client.step(DriveAction(accel=1.0, steer=0.0))

Headless Mode (parallel training)

Set these env vars before importing pygame to run without a display:

import os
os.environ["SDL_VIDEODRIVER"] = "dummy"
os.environ["SDL_AUDIODRIVER"] = "dummy"

RaceEnvironment renders entirely to offscreen pygame.Surface objects, so no display is needed at any point.