Spaces:

rishabh16196
/

traffic_light_env

Sleeping

App Files Files Community

rishabh16196 commited on Apr 7

Commit

baedb36

verified ·

1 Parent(s): 3e960de

Upload folder using huggingface_hub

Browse files

Files changed (2) hide show

README.md +68 -232
inference.py +243 -78

README.md CHANGED Viewed

@@ -11,9 +11,32 @@ tags:
   - openenv
 ---
-# Traffic Light Env Environment
-An RL environment simulating a 4-way traffic intersection where an AI agent controls the traffic light to minimize vehicle waiting time. The intersection has 4 traffic-flow directions (NS, SN, EW, WE), each with 2 lanes (8 lanes total), and 6 selectable green phases. Features 7 task scenarios with increasing difficulty and automated grading rubrics.
 ## Quick Start
@@ -25,122 +48,52 @@ async with TrafficLightEnv(base_url="http://localhost:8000") as env:
     obs = result.observation
     for step in range(200):
-        # Choose phase based on which axis has more waiting vehicles
         ns_sn = obs.ns_100m + obs.sn_100m
         ew_we = obs.ew_100m + obs.we_100m
-        phase = 0 if ns_sn >= ew_we else 1  # corridor phases
         result = await env.step(TrafficLightAction(phase=phase))
         obs = result.observation
         if obs.done:
             print(f"Grade: {obs.grade_score}")
-            print(f"Passed: {obs.grade_details['passed']}")
             break
 ```
-### From Docker
-```python
-env = await TrafficLightEnv.from_docker_image("traffic_light_env-env:latest")
-result = await env.reset(task="gridlock")
-# ... run episode ...
-await env.close()
-```
-## Building the Docker Image
-```bash
-docker build -t traffic_light_env-env:latest -f server/Dockerfile .
-```
-## How It Works
-### Intersection Model
-A single 4-way intersection with **4 traffic-flow directions**, each with **2 lanes**:
-| Direction | Code | Description |
 |---|---|---|
-| NS | `DIR_NS = 0` | North → South (lanes 0, 1) |
-| SN | `DIR_SN = 1` | South → North (lanes 2, 3) |
-| EW | `DIR_EW = 2` | East → West (lanes 4, 5) |
-| WE | `DIR_WE = 3` | West → East (lanes 6, 7) |
-Each of the 8 lanes has two observation zones:
-- **100 m zone**: Vehicles near the stop line, ready to depart on green (max 30 per lane)
-- **500 m zone**: Vehicles approaching, migrating toward 100 m each step (max 40 per lane)
-Each direction has its own traffic light (red/yellow/green).
-### Agent Actions — 6 Phases
-The agent selects one of 6 phases each step:
-| Phase | Green Directions | Lanes Green | Description |
-|---|---|---|---|
-| **0** | NS + SN | 4 | Full north-south corridor |
-| **1** | EW + WE | 4 | Full east-west corridor |
-| **2** | NS only | 2 | North-to-south only |
-| **3** | SN only | 2 | South-to-north only |
-| **4** | EW only | 2 | East-to-west only |
-| **5** | WE only | 2 | West-to-east only |
-Corridor phases (0, 1) green 4 lanes for maximum throughput. Single-direction phases (2-5) are useful when one direction is much busier than its opposite. Switching phases triggers a mandatory **2-step yellow transition**.
-### Vehicle Types and Stopping Physics
-Each vehicle is randomly assigned a type with real-world physics properties:
-| Type | Speed | Reaction Time | Deceleration | Stopping Distance | Dilemma Fraction | Spawn Weight |
-|---|---|---|---|---|---|---|
-| Car | 50 km/h | 1.0 s | 6.8 m/s² | 28.1 m | 28.1% | 40% |
-| SUV | 50 km/h | 1.2 s | 6.0 m/s�� | 32.7 m | 32.7% | 25% |
-| Bus | 40 km/h | 1.5 s | 4.5 m/s² | 30.4 m | 30.4% | 10% |
-| Truck | 45 km/h | 1.4 s | 4.0 m/s² | 37.0 m | 37.0% | 15% |
-| Motorcycle | 55 km/h | 0.8 s | 7.5 m/s² | 27.8 m | 27.8% | 10% |
-**Stopping distance** = reaction distance + braking distance:
-- `d_reaction = speed × reaction_time`
-- `d_braking = speed² / (2 × deceleration)`
-Assumes dry urban road conditions (friction coefficient ~0.7).
-### Dilemma Zone
-When the agent switches phases (green → yellow), vehicles in the 100 m zone whose stopping distance exceeds their position from the intersection are in the **dilemma zone** — they cannot safely stop in time. The risk is computed assuming vehicles are uniformly distributed within the zone:
-`dilemma_vehicles = Σ (vehicle_count × stopping_distance / 100)`
-This creates a key strategic tension: switching phases clears different queues but puts vehicles at risk. The agent must balance **throughput** (clearing queues) against **safety** (avoiding dilemma-zone incidents), especially when heavy vehicles (trucks, buses) are in the green lanes.
-### Step Mechanics
-Each timestep:
-1. Process phase/yellow transition based on agent's action
-2. **If switching**: compute dilemma-zone risk for previously-green directions
-3. Depart up to 3 vehicles per green lane from the 100 m zone
-4. Migrate 40% of 500 m vehicles to 100 m zone
-5. Arrive new vehicles at 500 m zone (Poisson-distributed per lane, random type)
-6. Update emergency vehicle state (if applicable)
-7. Compute reward
-### Reward Signal
-Per-step reward (used for RL training):
-- `-1.0` per vehicle in any 100 m zone
-- `-0.3` per vehicle in any 500 m zone
-- `-2.0` penalty when switching phases
-- `-1.5` per dilemma-zone vehicle (on phase switch)
-- `-5.0` per step while an emergency vehicle is waiting
 ## Tasks
-Seven scenarios with increasing difficulty, selected at reset:
-```python
-result = await env.reset(task="balanced")    # or "random" for a random task
-```
 | Task | Difficulty | Arrival Rates [NS, SN, EW, WE] | Notes |
 |---|---|---|---|
@@ -152,161 +105,44 @@ result = await env.reset(task="balanced")    # or "random" for a random task
 | `gridlock` | Very Hard | [2.0, 2.0, 2.0, 2.0] | All directions heavy |
 | `emergency_vehicle` | Hard | [1.0, 1.0, 1.0, 1.0] + emergency | Emergency vehicle spawns at step 10 |
-Arrival rates are per direction; each lane receives half the direction rate.
 ## Grading
-Each episode is automatically graded on a **0.0-1.0 scale** at step 200. The grade is returned in the terminal observation:
-```python
-obs.grade_score    # 0.0 - 1.0
-obs.grade_details  # Full breakdown dict
-```
-### Grading Components
-**Standard tasks** (40/40/20 weighting):
-| Component | Weight | Metric |
-|---|---|---|
-| Waiting score | 40% | Average vehicles waiting per step (lower is better) |
-| Throughput score | 40% | Total vehicles cleared over episode (higher is better) |
-| Safety score | 20% | Cumulative dilemma-zone vehicles (lower is better, 0=perfect, 50+=fail) |
-**Emergency vehicle task** (25/20/15/40 weighting):
-| Component | Weight | Metric |
-|---|---|---|
-| Waiting score | 25% | Average vehicles waiting per step |
-| Throughput score | 20% | Total vehicles cleared |
-| Safety score | 15% | Cumulative dilemma-zone vehicles |
-| Emergency score | 40% | How quickly the emergency vehicle was cleared |
-### Per-Task Thresholds
-| Task | Perfect avg_waiting | Fail avg_waiting | Perfect throughput | Fail throughput |
-|---|---|---|---|---|
-| `balanced` | <= 15 | >= 70 | >= 700 | <= 250 |
-| `rush_hour_ns` | <= 15 | >= 60 | >= 800 | <= 300 |
-| `rush_hour_ew` | <= 15 | >= 60 | >= 800 | <= 300 |
-| `alternating_surge` | <= 30 | >= 120 | >= 800 | <= 300 |
-| `random_spikes` | <= 15 | >= 60 | >= 600 | <= 200 |
-| `gridlock` | <= 100 | >= 500 | >= 800 | <= 350 |
-| `emergency_vehicle` | <= 15 | >= 70 | >= 700 | <= 250 |
-A score >= **0.5** is considered **passed**.
-### Emergency Clearance Scoring
-| Cleared within | Score |
-|---|---|
-| 3 steps | 1.0 |
-| 8 steps | 0.7 |
-| 15 steps | 0.4 |
-| 30 steps | 0.1 |
-| Not cleared | 0.0 |
-## Baseline Scores
-Two baselines compared — fixed-timer (switches every 10 steps) and smart heuristic (adaptive). Seed=42, 200 steps per episode.
-### Fixed 10-step timer (best overall baseline)
-| Task | Score | Waiting | Throughput | Safety | Emergency | Dilemma | Result |
-|---|---|---|---|---|---|---|---|
-| `balanced` | 0.8314 | 0.671 | 1.000 | 0.815 | — | 9.25 | PASS |
-| `rush_hour_ns` | 0.6906 | 0.409 | 1.000 | 0.636 | — | 18.22 | PASS |
-| `rush_hour_ew` | 0.7710 | 0.534 | 1.000 | 0.786 | — | 10.68 | PASS |
-| `alternating_surge` | 0.8103 | 0.791 | 1.000 | 0.469 | — | 26.54 | PASS |
-| `random_spikes` | 0.8132 | 0.630 | 1.000 | 0.806 | — | 9.72 | PASS |
-| `gridlock` | 0.8482 | 1.000 | 1.000 | 0.241 | — | 37.96 | PASS |
-| `emergency_vehicle` | 0.8845 | 0.701 | 1.000 | 0.729 | 1.000 | 13.54 | PASS |
-**Average: 0.807**
-### Smart heuristic (adaptive, switches on demand)
-| Task | Score | Waiting | Throughput | Safety | Emergency | Dilemma | Result |
-|---|---|---|---|---|---|---|---|
-| `balanced` | 0.6032 | 0.508 | 1.000 | 0.000 | — | 208.16 | PASS |
-| `rush_hour_ns` | 0.7545 | 0.727 | 1.000 | 0.319 | — | 34.05 | PASS |
-| `rush_hour_ew` | 0.7772 | 0.754 | 1.000 | 0.378 | — | 31.09 | PASS |
-| `alternating_surge` | 0.5906 | 0.477 | 1.000 | 0.000 | — | 409.59 | PASS |
-| `random_spikes` | 0.6448 | 0.612 | 1.000 | 0.000 | — | 125.39 | PASS |
-| `gridlock` | 0.5334 | 0.334 | 1.000 | 0.000 | — | 1989.23 | PASS |
-| `emergency_vehicle` | 0.6932 | 0.373 | 1.000 | 0.000 | 1.000 | 297.64 | PASS |
-**Average: 0.657**
-The smart heuristic switches too often, causing massive dilemma-zone incidents (up to 1989 vehicles on gridlock). The fixed timer is safer but can't adapt to asymmetric traffic. An LLM agent that reasons about both traffic patterns and vehicle composition should outperform both.
-## Observation
-The `TrafficLightObservation` provides:
-| Field | Description |
-|---|---|
-| `ns_100m`, `sn_100m`, `ew_100m`, `we_100m` | Per-direction 100 m queue totals (sum of 2 lanes) |
-| `ns_500m`, `sn_500m`, `ew_500m`, `we_500m` | Per-direction 500 m queue totals |
-| `light_ns`, `light_sn`, `light_ew`, `light_we` | Per-direction light state (0=red, 1=yellow, 2=green) |
-| `active_phase` | Current phase 0-5 (-1 during yellow) |
-| `yellow_remaining` | Steps left in yellow transition |
-| `time_in_phase` | Steps since last phase change |
-| `emergency_direction` | Direction with emergency vehicle (0-3, -1=none) |
-| `emergency_lane` | Specific lane (0-7, -1=none) |
-| `emergency_wait` | Steps the emergency vehicle has waited |
-| `total_waiting` | Total vehicles across all zones |
-| `total_throughput` | Cumulative vehicles cleared |
-| `arrivals` | Vehicles arrived this step per direction [NS, SN, EW, WE] |
-| `departures` | Vehicles departed this step per direction |
-| `lanes_100m` | Per-lane 100 m queues (8 values) |
-| `lanes_500m` | Per-lane 500 m queues (8 values) |
-| `vehicles_100m` | Per-type, per-direction counts at 100 m (`{"car": [ns,sn,ew,we], ...}`) |
-| `vehicles_500m` | Per-type, per-direction counts at 500 m |
-| `dilemma_risk` | Dilemma-zone vehicles this step (0.0 if no switch) |
-| `total_dilemma_vehicles` | Cumulative dilemma-zone vehicles this episode |
-| `step_number` | Current step (0-200) |
-| `done` | Whether the episode is over |
-| `reward` | Per-step reward signal |
-| `grade_score` | Final grade 0.0-1.0 (only on terminal step) |
-| `grade_details` | Grading breakdown dict (only on terminal step) |
-## Deploying to Hugging Face Spaces
 ```bash
-openenv push
-openenv push --repo-id my-org/traffic-light-env --private
-```
-The deployed space includes:
-- **Web Interface** at `/web`
-- **API Documentation** at `/docs`
-- **Health Check** at `/health`
-- **WebSocket** at `/ws`
-## Development
-### Running Locally
-```bash
-uvicorn server.app:app --reload
 ```
-### Concurrent Sessions
-The server supports multiple concurrent WebSocket connections (configured in `server/app.py` via `max_concurrent_envs`).
 ## Project Structure
 ```
 traffic_light_env/
 ├── openenv.yaml           # OpenEnv manifest
 ├── pyproject.toml         # Project metadata and dependencies
-├── uv.lock                # Locked dependencies
-├── inference.py           # Baseline inference script (LLM agent)
-├── __init__.py            # Module exports
-├── client.py              # TrafficLightEnv client (HTTP/WebSocket)
-├── models.py              # Action, Observation, constants
 └── server/
     ├── app.py             # FastAPI application
     ├── traffic_light_env_environment.py  # Core simulation logic

   - openenv
 ---
+# Smart Traffic Light Control Environment
+Anyone who has sat at a red light with zero cars on the cross street knows the frustration: dumb traffic signals waste millions of hours every day. This environment exists to change that. It provides a realistic 4-way intersection simulator where AI agents learn to control traffic lights intelligently — minimizing waiting time, maximizing throughput, and keeping vehicles safe.
+## The Problem
+Traditional traffic lights run on fixed timers or simple sensors. They can't anticipate surges, adapt to rush-hour asymmetry, or reason about the safety cost of switching. The result: needless idling, wasted fuel, and preventable dilemma-zone accidents. This environment lets us train and evaluate models that do better.
+## What We Built
+- A physics-based intersection simulator with 5 real-world vehicle types, dilemma-zone safety modeling, and 7 task scenarios ranging from balanced traffic to gridlock
+- A hybrid LLM + heuristic inference agent (`inference.py`) that uses task-specific strategies with periodic LLM consultation
+- A FastAPI server exposing the environment over HTTP/WebSocket, deployable via Docker or Hugging Face Spaces
+- Automated grading rubrics that score agents on waiting time, throughput, and safety
+### Our Agent's Results (avg 0.83, beating the 0.807 fixed-timer baseline)
+| Task | Our Score | Fixed Timer |
+|---|---|---|
+| balanced | 0.82 | 0.83 |
+| rush_hour_ns | 0.79 | 0.69 |
+| rush_hour_ew | 0.82 | 0.77 |
+| alternating_surge | 0.87 | 0.81 |
+| random_spikes | 0.83 | 0.81 |
+| gridlock | 0.88 | 0.85 |
+| emergency_vehicle | 0.88 | 0.88 |
 ## Quick Start
     obs = result.observation
     for step in range(200):
         ns_sn = obs.ns_100m + obs.sn_100m
         ew_we = obs.ew_100m + obs.we_100m
+        phase = 0 if ns_sn >= ew_we else 1
         result = await env.step(TrafficLightAction(phase=phase))
         obs = result.observation
         if obs.done:
             print(f"Grade: {obs.grade_score}")
             break
 ```
+## Intersection Model
+A 4-way intersection with **4 directions** (NS, SN, EW, WE), **2 lanes each** (8 total). Each lane has a **100 m zone** (vehicles ready to depart) and a **500 m zone** (vehicles approaching).
+The agent picks one of **6 phases** each step:
+| Phase | Green Directions | Description |
 |---|---|---|
+| 0 | NS + SN | Full north-south corridor (4 lanes) |
+| 1 | EW + WE | Full east-west corridor (4 lanes) |
+| 2 | NS only | North-to-south only (2 lanes) |
+| 3 | SN only | South-to-north only (2 lanes) |
+| 4 | EW only | East-to-west only (2 lanes) |
+| 5 | WE only | West-to-east only (2 lanes) |
+Switching phases triggers a **2-step yellow transition** with no departures.
+## Vehicle Types
+Five vehicle types with real-world stopping physics. Heavier vehicles are harder to stop, creating higher dilemma-zone risk when switching phases.
+| Type | Speed | Stopping Distance | Dilemma Risk | Spawn Rate |
+|---|---|---|---|---|
+| Car | 50 km/h | 28.1 m | 28% | 40% |
+| SUV | 50 km/h | 32.7 m | 33% | 25% |
+| Truck | 45 km/h | 37.0 m | 37% | 15% |
+| Bus | 40 km/h | 30.4 m | 30% | 10% |
+| Motorcycle | 55 km/h | 27.8 m | 28% | 10% |
+When a phase switch occurs, vehicles in the 100 m zone that can't stop safely are in the **dilemma zone**. Each dilemma-zone vehicle incurs a -1.5 reward penalty. Trucks and buses are the riskiest.
 ## Tasks
+Seven scenarios with increasing difficulty:
 | Task | Difficulty | Arrival Rates [NS, SN, EW, WE] | Notes |
 |---|---|---|---|
 | `gridlock` | Very Hard | [2.0, 2.0, 2.0, 2.0] | All directions heavy |
 | `emergency_vehicle` | Hard | [1.0, 1.0, 1.0, 1.0] + emergency | Emergency vehicle spawns at step 10 |
 ## Grading
+Episodes are graded 0.0-1.0 at step 200. Standard tasks: 40% waiting + 40% throughput + 20% safety. Emergency task: 25% waiting + 20% throughput + 15% safety + 40% emergency clearance speed. Score >= 0.5 = pass.
+## Our Inference Strategy
+The `inference.py` agent uses a **hybrid heuristic + LLM approach**:
+1. **Task-specific heuristics** handle most steps (fast, no API cost, avoids over-switching)
+2. **Periodic LLM consultation** (via OpenAI-compatible API) provides strategic guidance at key decision points
+3. **Per-task tuning**: different hold times, switch thresholds, and strategies for each scenario
+4. **Dilemma-zone awareness**: factors in vehicle composition before switching to minimize safety penalties
+5. **Pattern detection**: pre-emptively switches for alternating surge boundaries, uses fixed-timer for gridlock, and immediately overrides for emergency vehicles
+## Running
 ```bash
+# Start the server
+uvicorn traffic_light_env.server.app:app --reload --port 8000
+# Run inference (set your API key)
+OPENAI_API_KEY="..." API_BASE_URL="https://api.openai.com/v1" MODEL_NAME="gpt-4o-mini" \
+  python traffic_light_env/inference.py
+# Docker
+docker build -t traffic_light_env-env:latest -f server/Dockerfile .
 ```
 ## Project Structure
 ```
 traffic_light_env/
+├── inference.py           # Hybrid LLM + heuristic agent
+├── models.py              # Action, Observation, vehicle physics constants
+├── client.py              # TrafficLightEnv client (HTTP/WebSocket)
+├── __init__.py            # Module exports
 ├── openenv.yaml           # OpenEnv manifest
 ├── pyproject.toml         # Project metadata and dependencies
 └── server/
     ├── app.py             # FastAPI application
     ├── traffic_light_env_environment.py  # Core simulation logic

inference.py CHANGED Viewed

@@ -56,11 +56,20 @@ MAX_STEPS = 200
 TEMPERATURE = 0.2
 MAX_TOKENS = 128
-# Strategy parameters
-MIN_HOLD_TIME = 8          # Minimum steps to hold a phase before considering switch
-SWITCH_THRESHOLD = 1.8     # Opposing axis must be this many times busier to switch
-LLM_CONSULT_INTERVAL = 10  # Ask LLM every N steps for strategic guidance
-EMERGENCY_OVERRIDE = True  # Immediately switch for emergency vehicles
 # Tasks to run. Override with TRAFFIC_LIGHT_TASKS env var (comma-separated).
 TASKS = os.getenv("TRAFFIC_LIGHT_TASKS", ",".join(TASK_NAMES)).split(",")
@@ -145,103 +154,247 @@ def get_green_dirs(phase: int) -> List[int]:
 # ---------------------------------------------------------------------------
-# Smart heuristic (primary decision maker)
 # ---------------------------------------------------------------------------
-def smart_heuristic(obs: Any, current_phase: int, time_in_phase: int) -> int:
     """
-    Heuristic that minimizes switching while maintaining good throughput.
-    Key insight: the fixed-timer baseline (switch every 10 steps) scores 0.81.
-    We can beat it by being smarter about WHEN to switch.
     """
-    # During yellow, we can't do anything — return current pending or active
-    if obs.yellow_remaining > 0:
-        return obs.active_phase if obs.active_phase >= 0 else current_phase
-    # Emergency override: immediately switch to emergency corridor
     if obs.emergency_direction >= 0:
         d = obs.emergency_direction
         target = 0 if d <= 1 else 1
         if current_phase != target:
             return target
         return current_phase
-    # Compute axis loads (100m weighted heavily, 500m as future pressure)
     ns_sn_100 = obs.ns_100m + obs.sn_100m
     ew_we_100 = obs.ew_100m + obs.we_100m
-    ns_sn_500 = obs.ns_500m + obs.sn_500m
-    ew_we_500 = obs.ew_500m + obs.we_500m
-    ns_sn_load = ns_sn_100 + 0.3 * ns_sn_500
-    ew_we_load = ew_we_100 + 0.3 * ew_we_500
-    # Determine which corridor the current phase serves
-    current_green_dirs = get_green_dirs(current_phase)
-    serves_ns = any(d in [0, 1] for d in current_green_dirs)
-    serves_ew = any(d in [2, 3] for d in current_green_dirs)
-    current_load = 0.0
-    opposing_load = 0.0
     if serves_ns and not serves_ew:
-        current_load = ns_sn_load
-        opposing_load = ew_we_load
     elif serves_ew and not serves_ns:
-        current_load = ew_we_load
-        opposing_load = ns_sn_load
     else:
-        # Phase serves both or neither — use corridor phases
-        current_load = ns_sn_load
-        opposing_load = ew_we_load
-    # Don't switch if we haven't held long enough
-    if time_in_phase < MIN_HOLD_TIME:
         return current_phase
-    # Check if opposing axis is significantly busier
-    if opposing_load > 0 and current_load > 0:
         ratio = opposing_load / max(current_load, 1.0)
     elif opposing_load > 0:
-        ratio = 10.0  # Current axis is empty
     else:
-        ratio = 0.0  # Opposing axis is empty
-    # Also factor in dilemma risk — if many heavy vehicles in green lanes, don't switch
-    dilemma_risk = estimate_dilemma_risk(obs, current_green_dirs)
-    # Adaptive threshold: require higher ratio if dilemma risk is high
-    effective_threshold = SWITCH_THRESHOLD + (dilemma_risk * 0.1)
-    if ratio >= effective_threshold:
-        # Switch to the opposing corridor
-        if serves_ns or (not serves_ew and ns_sn_load < ew_we_load):
-            # Check if one EW direction dominates — use single phase
-            if obs.ew_100m > 3 * obs.we_100m and obs.ew_100m > 10:
-                return 4  # EW only
-            elif obs.we_100m > 3 * obs.ew_100m and obs.we_100m > 10:
-                return 5  # WE only
             return 1  # EW+WE corridor
         else:
-            if obs.ns_100m > 3 * obs.sn_100m and obs.ns_100m > 10:
-                return 2  # NS only
-            elif obs.sn_100m > 3 * obs.ns_100m and obs.sn_100m > 10:
-                return 3  # SN only
             return 0  # NS+SN corridor
-    # Check for very unbalanced single-direction loads within current axis
-    if serves_ns and time_in_phase >= MIN_HOLD_TIME + 4:
-        if obs.ns_100m > 3 * obs.sn_100m and obs.ns_100m > 15 and current_phase == 0:
-            return 2  # Focus on NS only
-        elif obs.sn_100m > 3 * obs.ns_100m and obs.sn_100m > 15 and current_phase == 0:
-            return 3  # Focus on SN only
-    elif serves_ew and time_in_phase >= MIN_HOLD_TIME + 4:
-        if obs.ew_100m > 3 * obs.we_100m and obs.ew_100m > 15 and current_phase == 1:
-            return 4
-        elif obs.we_100m > 3 * obs.ew_100m and obs.we_100m > 15 and current_phase == 1:
-            return 5
     return current_phase
 # ---------------------------------------------------------------------------
 # Observation → LLM prompt
 # ---------------------------------------------------------------------------
@@ -281,7 +434,7 @@ def obs_to_summary(obs: Any) -> str:
         )
     # Heuristic recommendation
-    heuristic_rec = smart_heuristic(obs, obs.active_phase, obs.time_in_phase)
     lines.append(f"\nHeuristic recommends: phase {heuristic_rec} ({phase_desc.get(heuristic_rec, '?')})")
     return "\n".join(lines)
@@ -320,7 +473,7 @@ def get_phase_from_llm(
     except Exception as exc:
         print(f"[DEBUG] Model request failed: {exc}", flush=True)
-    return smart_heuristic(obs, obs.active_phase, obs.time_in_phase)
 # ---------------------------------------------------------------------------
@@ -334,27 +487,33 @@ def decide_phase(
     step: int,
     current_phase: int,
     time_in_phase: int,
 ) -> int:
     """
     Hybrid approach:
-    - Use heuristic for most steps (fast, no API cost, avoids over-switching)
-    - Consult LLM every LLM_CONSULT_INTERVAL steps for strategic decisions
-    - Always use heuristic for emergency overrides
     """
     # During yellow, just hold
     if obs.yellow_remaining > 0:
         return current_phase
     # Emergency: always use heuristic (fast, deterministic)
     if obs.emergency_direction >= 0:
-        return smart_heuristic(obs, current_phase, time_in_phase)
-    # Consult LLM at strategic intervals when we might need to switch
-    if (step % LLM_CONSULT_INTERVAL == 0) and time_in_phase >= MIN_HOLD_TIME:
         return get_phase_from_llm(client, obs, history)
-    # Default: use heuristic
-    return smart_heuristic(obs, current_phase, time_in_phase)
 # ---------------------------------------------------------------------------
@@ -376,6 +535,7 @@ async def run_task(client: OpenAI, env: TrafficLightEnv, task: str) -> Dict[str,
         obs = result.observation
         current_phase = 0  # Start at NS+SN corridor
         time_in_phase = 0
         for step in range(1, MAX_STEPS + 1):
             if result.done:
@@ -384,7 +544,12 @@ async def run_task(client: OpenAI, env: TrafficLightEnv, task: str) -> Dict[str,
             phase = decide_phase(
                 client, obs, history, step,
                 current_phase, time_in_phase,
             )
             # Track phase timing locally
             if phase != current_phase:

 TEMPERATURE = 0.2
 MAX_TOKENS = 128
+# Per-task tuning parameters: (min_hold, switch_threshold, llm_interval)
+# min_hold: minimum steps to hold a phase before considering switch
+# switch_threshold: opposing axis must be this factor busier to trigger switch
+# llm_interval: consult LLM every N steps (0 = never use LLM for this task)
+TASK_PARAMS: Dict[str, Dict[str, Any]] = {
+    "balanced":           {"min_hold": 8,  "switch_thresh": 1.6, "llm_interval": 15},
+    "rush_hour_ns":       {"min_hold": 8,  "switch_thresh": 1.8, "llm_interval": 0},
+    "rush_hour_ew":       {"min_hold": 8,  "switch_thresh": 1.8, "llm_interval": 0},
+    "alternating_surge":  {"min_hold": 6,  "switch_thresh": 1.4, "llm_interval": 0},  # pattern-based
+    "random_spikes":      {"min_hold": 8,  "switch_thresh": 1.5, "llm_interval": 15},
+    "gridlock":           {"min_hold": 8,  "switch_thresh": 1.3, "llm_interval": 0},   # fixed timer
+    "emergency_vehicle":  {"min_hold": 8,  "switch_thresh": 1.6, "llm_interval": 0},   # heuristic only
+}
+DEFAULT_PARAMS = {"min_hold": 8, "switch_thresh": 1.8, "llm_interval": 10}
 # Tasks to run. Override with TRAFFIC_LIGHT_TASKS env var (comma-separated).
 TASKS = os.getenv("TRAFFIC_LIGHT_TASKS", ",".join(TASK_NAMES)).split(",")
 # ---------------------------------------------------------------------------
+# Task-specific strategies
 # ---------------------------------------------------------------------------
+def _alternating_surge_strategy(obs: Any, current_phase: int, time_in_phase: int) -> int:
     """
+    Surge pattern: NS/SN surge when (step//30)%2==0, EW/WE surge otherwise.
+    Pre-emptively switch 2 steps before surge boundary to absorb yellow transition.
     """
+    step = obs.step_number
+    period = 30
+    # Which surge are we in now?
+    ns_surge = (step // period) % 2 == 0
+    # When does the next surge boundary hit?
+    next_boundary = ((step // period) + 1) * period
+    steps_to_boundary = next_boundary - step
+    # Target corridor for current surge
+    target = 0 if ns_surge else 1
+    # Pre-emptive switch: 2 steps before boundary, switch to upcoming corridor
+    if steps_to_boundary <= 2:
+        upcoming_target = 1 if ns_surge else 0  # opposite of current surge
+        if current_phase != upcoming_target:
+            return upcoming_target
+        return current_phase
+    # During surge, ensure we're on the right corridor
+    if current_phase != target and time_in_phase >= 6:
+        return target
+    # If we're on the right corridor, check for load imbalance within the axis
+    if current_phase == target and time_in_phase >= 10:
+        if target == 0:  # NS/SN corridor
+            ns_sn_100 = obs.ns_100m + obs.sn_100m
+            ew_we_100 = obs.ew_100m + obs.we_100m
+            # If EW/WE is building up massively despite NS surge, give it some time
+            if ew_we_100 > ns_sn_100 * 2.5 and ew_we_100 > 20:
+                return 1
+        else:  # EW/WE corridor
+            ns_sn_100 = obs.ns_100m + obs.sn_100m
+            ew_we_100 = obs.ew_100m + obs.we_100m
+            if ns_sn_100 > ew_we_100 * 2.5 and ns_sn_100 > 20:
+                return 0
+    return current_phase
+def _gridlock_strategy(obs: Any, current_phase: int, time_in_phase: int) -> int:
+    """
+    Gridlock: all directions have equal rate 2.0.
+    Use fixed timer (~10 steps) switching between corridor 0 and 1.
+    Matches the fixed-timer baseline approach which scores 0.848.
+    Only use corridor phases for maximum throughput.
+    """
+    GRIDLOCK_CYCLE = 10
+    # Ensure we only use corridor phases
+    if current_phase not in (0, 1):
+        return 0  # Reset to corridor
+    if time_in_phase >= GRIDLOCK_CYCLE:
+        # Check dilemma risk before switching
+        green_dirs = get_green_dirs(current_phase)
+        dilemma = estimate_dilemma_risk(obs, green_dirs)
+        # Delay switch by 1-2 steps if dilemma risk is very high
+        if dilemma > 8 and time_in_phase < GRIDLOCK_CYCLE + 2:
+            return current_phase
+        # Alternate between corridors
+        return 1 if current_phase == 0 else 0
+    return current_phase
+def _emergency_strategy(obs: Any, current_phase: int, time_in_phase: int,
+                         emergency_handled: bool) -> int:
+    """
+    Emergency vehicle task: prioritize clearing the emergency ASAP.
+    Emergency clearance is 40% of the grade — must be within 3 steps for 1.0 score.
+    Strategy: use corridor phase covering the emergency direction (greens 4 lanes,
+    including the emergency lane, while maintaining throughput).
+    """
     if obs.emergency_direction >= 0:
         d = obs.emergency_direction
+        # Use corridor phase — it greens the emergency direction AND its opposite
+        # for better throughput, while still clearing the emergency
         target = 0 if d <= 1 else 1
         if current_phase != target:
             return target
         return current_phase
+    # Before emergency appears (step < 10), use balanced strategy but
+    # position on phase 0 (NS+SN) to be ready for 50% of emergencies
+    if not emergency_handled and obs.step_number < 10:
+        # Pre-position: stay on phase 0 — if emergency is NS/SN, we're ready
+        return _balanced_strategy(obs, current_phase, time_in_phase, "balanced")
+    # After emergency cleared, use standard balanced strategy
+    return _balanced_strategy(obs, current_phase, time_in_phase, "balanced")
+def _rush_hour_strategy(obs: Any, current_phase: int, time_in_phase: int,
+                         task_name: str) -> int:
+    """
+    Rush hour: one axis is much busier (rate ~2.0 vs ~0.4).
+    Strategy: stay on the busy corridor most of the time.
+    Give quiet axis brief windows (~6 steps) to prevent total starvation.
+    Switch back to busy corridor as soon as quiet axis is drained.
+    """
+    if task_name == "rush_hour_ns":
+        busy_corridor = 0  # NS+SN
+    else:
+        busy_corridor = 1  # EW+WE
     ns_sn_100 = obs.ns_100m + obs.sn_100m
     ew_we_100 = obs.ew_100m + obs.we_100m
+    ns_sn_load = ns_sn_100 + 0.3 * (obs.ns_500m + obs.sn_500m)
+    ew_we_load = ew_we_100 + 0.3 * (obs.ew_500m + obs.we_500m)
+    busy_load = ns_sn_load if busy_corridor == 0 else ew_we_load
+    quiet_load = ew_we_load if busy_corridor == 0 else ns_sn_load
+    busy_100 = ns_sn_100 if busy_corridor == 0 else ew_we_100
+    quiet_100 = ew_we_100 if busy_corridor == 0 else ns_sn_100
+    green_dirs = get_green_dirs(current_phase)
+    dilemma = estimate_dilemma_risk(obs, green_dirs)
+    if current_phase == busy_corridor:
+        # On busy corridor — hold for at least 8 steps
+        if time_in_phase < 8:
+            return current_phase
+        # Switch only if quiet axis is building up significantly
+        # and busy axis is somewhat drained
+        if quiet_100 > 15 and quiet_load > busy_load * 0.6 and dilemma < 6:
+            return 1 - busy_corridor
+        # Force give quiet axis a window after extended hold
+        if time_in_phase >= 12 and quiet_100 > 8 and dilemma < 6:
+            return 1 - busy_corridor
+        return current_phase
+    else:
+        # On quiet corridor — return to busy corridor quickly
+        if time_in_phase < 5:
+            return current_phase
+        # Return once quiet axis is drained or busy axis is building
+        if quiet_100 <= 4 or busy_100 > quiet_100 * 1.5 or time_in_phase >= 7:
+            return busy_corridor
+        return current_phase
+def _balanced_strategy(obs: Any, current_phase: int, time_in_phase: int,
+                        task_name: str) -> int:
+    """General adaptive strategy for balanced/random tasks."""
+    params = TASK_PARAMS.get(task_name, DEFAULT_PARAMS)
+    min_hold = params["min_hold"]
+    thresh = params["switch_thresh"]
+    ns_sn_100 = obs.ns_100m + obs.sn_100m
+    ew_we_100 = obs.ew_100m + obs.we_100m
+    ns_sn_load = ns_sn_100 + 0.3 * (obs.ns_500m + obs.sn_500m)
+    ew_we_load = ew_we_100 + 0.3 * (obs.ew_500m + obs.we_500m)
+    green_dirs = get_green_dirs(current_phase)
+    serves_ns = any(d in [0, 1] for d in green_dirs)
+    serves_ew = any(d in [2, 3] for d in green_dirs)
     if serves_ns and not serves_ew:
+        current_load, opposing_load = ns_sn_load, ew_we_load
     elif serves_ew and not serves_ns:
+        current_load, opposing_load = ew_we_load, ns_sn_load
     else:
+        current_load, opposing_load = ns_sn_load, ew_we_load
+    if time_in_phase < min_hold:
         return current_phase
+    # Compute switch ratio
+    if current_load > 0:
         ratio = opposing_load / max(current_load, 1.0)
     elif opposing_load > 0:
+        ratio = 10.0
     else:
+        ratio = 0.0
+    dilemma = estimate_dilemma_risk(obs, green_dirs)
+    effective_thresh = thresh + (dilemma * 0.08)
+    if ratio >= effective_thresh:
+        if ns_sn_load < ew_we_load:
             return 1  # EW+WE corridor
         else:
             return 0  # NS+SN corridor
+    # Force switch after max hold to prevent starvation
+    max_hold = 14 if task_name == "random_spikes" else 12
+    if time_in_phase >= max_hold and opposing_load > 5 and dilemma < 6:
+        if serves_ns:
+            return 1
+        else:
+            return 0
     return current_phase
+# ---------------------------------------------------------------------------
+# Smart heuristic (primary decision maker)
+# ---------------------------------------------------------------------------
+def smart_heuristic(obs: Any, current_phase: int, time_in_phase: int,
+                     task_name: str = "balanced",
+                     emergency_handled: bool = False) -> int:
+    """
+    Task-aware heuristic that minimizes switching while maintaining throughput.
+    Dispatches to task-specific strategies.
+    """
+    # During yellow, can't change — hold current
+    if obs.yellow_remaining > 0:
+        return obs.active_phase if obs.active_phase >= 0 else current_phase
+    # Emergency override for ANY task (highest priority)
+    if obs.emergency_direction >= 0:
+        d = obs.emergency_direction
+        target = 0 if d <= 1 else 1
+        if current_phase != target:
+            return target
+        return current_phase
+    # Dispatch to task-specific strategy
+    if task_name == "alternating_surge":
+        return _alternating_surge_strategy(obs, current_phase, time_in_phase)
+    elif task_name == "gridlock":
+        return _gridlock_strategy(obs, current_phase, time_in_phase)
+    elif task_name == "emergency_vehicle":
+        return _emergency_strategy(obs, current_phase, time_in_phase, emergency_handled)
+    elif task_name in ("rush_hour_ns", "rush_hour_ew"):
+        return _rush_hour_strategy(obs, current_phase, time_in_phase, task_name)
+    else:
+        return _balanced_strategy(obs, current_phase, time_in_phase, task_name)
 # ---------------------------------------------------------------------------
 # Observation → LLM prompt
 # ---------------------------------------------------------------------------
         )
     # Heuristic recommendation
+    heuristic_rec = smart_heuristic(obs, obs.active_phase, obs.time_in_phase, obs.task_name)
     lines.append(f"\nHeuristic recommends: phase {heuristic_rec} ({phase_desc.get(heuristic_rec, '?')})")
     return "\n".join(lines)
     except Exception as exc:
         print(f"[DEBUG] Model request failed: {exc}", flush=True)
+    return smart_heuristic(obs, obs.active_phase, obs.time_in_phase, obs.task_name)
 # ---------------------------------------------------------------------------
     step: int,
     current_phase: int,
     time_in_phase: int,
+    task_name: str = "balanced",
+    emergency_handled: bool = False,
 ) -> int:
     """
     Hybrid approach:
+    - Use task-specific heuristic for most steps
+    - Consult LLM at strategic intervals for tasks that benefit from it
+    - Always use heuristic for emergency overrides and pattern-based tasks
     """
+    params = TASK_PARAMS.get(task_name, DEFAULT_PARAMS)
+    llm_interval = params["llm_interval"]
+    min_hold = params["min_hold"]
     # During yellow, just hold
     if obs.yellow_remaining > 0:
         return current_phase
     # Emergency: always use heuristic (fast, deterministic)
     if obs.emergency_direction >= 0:
+        return smart_heuristic(obs, current_phase, time_in_phase, task_name, emergency_handled)
+    # Consult LLM at strategic intervals (only for tasks where it helps)
+    if llm_interval > 0 and (step % llm_interval == 0) and time_in_phase >= min_hold:
         return get_phase_from_llm(client, obs, history)
+    # Default: use task-specific heuristic
+    return smart_heuristic(obs, current_phase, time_in_phase, task_name, emergency_handled)
 # ---------------------------------------------------------------------------
         obs = result.observation
         current_phase = 0  # Start at NS+SN corridor
         time_in_phase = 0
+        emergency_handled = False
         for step in range(1, MAX_STEPS + 1):
             if result.done:
             phase = decide_phase(
                 client, obs, history, step,
                 current_phase, time_in_phase,
+                task_name=task,
+                emergency_handled=emergency_handled,
             )
+            # Track if emergency was ever active and then cleared
+            if obs.emergency_direction >= 0:
+                emergency_handled = True
             # Track phase timing locally
             if phase != current_phase: