Spaces:

arrow072
/

open_env_meta

Sleeping

File size: 5,905 Bytes

80f43df
5516cba
 
 
 
80f43df
5516cba
80f43df
 
 
5516cba

---
title: Traffic Signal Optimization — OpenEnv Elite
emoji: 🚦
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
---

# 🚥 Traffic Signal Optimization — OpenEnv Elite

> **Meta × PyTorch OpenEnv Hackathon Submission**
>
> A world-class Reinforcement Learning environment for urban traffic control, featuring stochastic multi-lane dynamics, emergency vehicle prioritization, and sophisticated fairness-driven rewards.

---

## 🏗️ Problem Statement

Fixed-cycle traffic signals are a relic of the past. In modern urban environments, they create **needless congestion**, increase **CO2 emissions**, and — most critically — cause **life-threatening delays** for emergency vehicles.

This project provides a high-fidelity 4-way intersection simulation designed for OpenEnv. It challenges RL agents to move beyond simple throughput and master the art of **dynamic balancing**: serving high-demand lanes while maintaining fairness for low-traffic directions and clearing "Golden Windows" for emergency responders.

---

## 🚀 Quick Start

```bash
# Run the complete suite: Simulation + Sanity Checks + Comparison
python test_env.py

# Run a specific high-intensity scenario
python test_env.py hard
```

```python
from env import TrafficEnv
from tasks import get_config
from baseline_agent import RuleBasedAgent

# 1. Load a structured difficulty profile
config = get_config("medium")
env    = TrafficEnv(config)

# 2. Initialize our sophisticated Rule-Based Controller
agent  = RuleBasedAgent()

state = env.reset()
done  = False

while not done:
    action = agent.select_action(state)
    state, reward, done, info = env.step(action)

print(f"Total Cleared: {info['total_cleared']}")
print(f"Fairness Index: {info['fairness_score']:.2f}")
```

---

## 🧠 Environment Design Philosophy

### State Space
The environment exposes a **14-dimensional** continuous observation vector, providing the agent with full situational awareness:
- **Queues (4)**: Exact vehicle count per lane [N, S, E, W].
- **Wait Pressure (4)**: Cumulative "impatience" score per lane.
- **Emergency Flags (4)**: Binary detection of EVs per lane.
- **Signal State (2)**: Current phase [0=NS, 1=EW] and step count.

### Action Space
- `0`: **Maintain** — keep the current green phase.
- `1`: **Switch** — transition the signal (includes yellow-phase discharge friction).

---

## 💎 Reward Engineering (The "Judge's Choice")

Our reward function is the core of this submission. It isn't just a count; it's a **multi-objective ethical framework** clipped to `[-1, 1]`:

| Component | Logic | Purpose |
| :--- | :--- | :--- |
| **Throughput (+)** | `+0.20 * cars_cleared` | Incentivizes active vehicle flow. |
| **Density (-)** | `-0.40 * total_congestion` | Penalizes letting the intersection fill up. |
| **Bottleneck (-)** | `-0.15 * max_queue` | Discourages extreme build-up in any single lane. |
| **Stability (-)** | `-switch_penalty` | Prevents "flickering" and promotes signal stability. |
| **Fairness (+/-)** | `+0.10` bonus / `-penalty` | Rewards balanced service; penalizes starvation. |
| **Emergency (🚨)** | `Golden Window` Bonus | Massive reward for clearing EVs within target steps. |
| **EV Delay (-)** | `Exponential Penalty` | Punishes agents for delaying life-saving vehicles. |

---

## 📊 Evaluation Metrics

We track **8 key performance indicators** per episode to ensure a winning submission can be quantified:

1.  **Total Cleared**: Raw efficiency metric.
2.  **Avg Waiting Time**: The "commuter frustration" index.
3.  **Max Queue Length**: Gauges system robustness against bottlenecks.
4.  **Signal Switch Count**: Measures policy stability.
5.  **Congestion Score**: Final system state snapshot.
6.  **Avg EV Clear Time**: Critical safety metric (lower is better).
7.  **Fairness Score**: [0, 1] index — how equally did we serve all lanes?
8.  **Total EV Penalty**: Measures total failure to prioritize safety.

---

## ⚡ Task Difficulty Levels

| Parameter | Easy | Medium | Hard |
| :--- | :--- | :--- | :--- |
| **Arrival Rate** | 0–1 | 1–3 | 2–5 |
| **Discharge Rate** | 4–5 | 3–5 | 2–4 |
| **Burst Frequency** | 0% | 10% | 20% |
| **Emergency Prob** | 1% | 5% | 15% |
| **EV Golden Window** | 8 steps | 5 steps | 3 steps |
| **Fairness Limit** | 20 steps | 15 steps | 10 steps |

---

## 🚑 Emergency & Fairness Logic

### The "Golden Window"
When an Emergency Vehicle (EV) appears, the agent is granted a bonus if it switches and clears the lane within the **Golden Window** (defined per difficulty). Failing to do so triggers an **exponential delay penalty**, simulating the real-world cost of stopping an ambulance or fire truck.

### Fairness Guard
To prevent "Starvation" (where the agent ignores a low-traffic lane to optimize throughput on a high-traffic lane), a **Fairness Score** is calculated. If a lane remains red beyond the **Starvation Limit**, the agent suffers a heavy penalty. This forces the agent to learn the complex trade-off between total throughput and social fairness.

---

## 🚶 Step Walkthrough

```text
Step 12:  🚨 Ambulance detected in East lane (currently RED).
          - EW Queue: 4, EV Timer: 0
          - Agent receives p_emergency penalty.

Step 13:  Agent Action: 1 (SWITCH to EW).
          - Switch penalty applied (-0.20).
          - NS lanes stop; EW lanes turn GREEN.

Step 14:  EV Cleared!
          - EV Clear Time: 2 steps.
          - Agent receives r_ev_bonus (+0.25) for "Golden Window" clearance.
          - Total cleared (+0.60 reward).
```

---

## 🔮 Future Improvements

- **Multi-Intersection Coordination**: Extending to a grid of agents using MARL.
- **Pedestrian Logic**: Adding crosswalks and pedestrian priority.
- **V2X Communication**: Providing agents with ahead-of-time traffic predictions.

---

## 📜 License

MIT © 2026 Meta x PyTorch OpenEnv Hackathon