File size: 5,905 Bytes
80f43df
5516cba
 
 
 
80f43df
5516cba
80f43df
 
 
5516cba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
title: Traffic Signal Optimization  OpenEnv Elite
emoji: 🚦
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
---

# 🚥 Traffic Signal Optimization — OpenEnv Elite

> **Meta × PyTorch OpenEnv Hackathon Submission**
>
> A world-class Reinforcement Learning environment for urban traffic control, featuring stochastic multi-lane dynamics, emergency vehicle prioritization, and sophisticated fairness-driven rewards.

---

## 🏗️ Problem Statement

Fixed-cycle traffic signals are a relic of the past. In modern urban environments, they create **needless congestion**, increase **CO2 emissions**, and — most critically — cause **life-threatening delays** for emergency vehicles.

This project provides a high-fidelity 4-way intersection simulation designed for OpenEnv. It challenges RL agents to move beyond simple throughput and master the art of **dynamic balancing**: serving high-demand lanes while maintaining fairness for low-traffic directions and clearing "Golden Windows" for emergency responders.

---

## 🚀 Quick Start

```bash
# Run the complete suite: Simulation + Sanity Checks + Comparison
python test_env.py

# Run a specific high-intensity scenario
python test_env.py hard
```

```python
from env import TrafficEnv
from tasks import get_config
from baseline_agent import RuleBasedAgent

# 1. Load a structured difficulty profile
config = get_config("medium")
env    = TrafficEnv(config)

# 2. Initialize our sophisticated Rule-Based Controller
agent  = RuleBasedAgent()

state = env.reset()
done  = False

while not done:
    action = agent.select_action(state)
    state, reward, done, info = env.step(action)

print(f"Total Cleared: {info['total_cleared']}")
print(f"Fairness Index: {info['fairness_score']:.2f}")
```

---

## 🧠 Environment Design Philosophy

### State Space
The environment exposes a **14-dimensional** continuous observation vector, providing the agent with full situational awareness:
- **Queues (4)**: Exact vehicle count per lane [N, S, E, W].
- **Wait Pressure (4)**: Cumulative "impatience" score per lane.
- **Emergency Flags (4)**: Binary detection of EVs per lane.
- **Signal State (2)**: Current phase [0=NS, 1=EW] and step count.

### Action Space
- `0`: **Maintain** — keep the current green phase.
- `1`: **Switch** — transition the signal (includes yellow-phase discharge friction).

---

## 💎 Reward Engineering (The "Judge's Choice")

Our reward function is the core of this submission. It isn't just a count; it's a **multi-objective ethical framework** clipped to `[-1, 1]`:

| Component | Logic | Purpose |
| :--- | :--- | :--- |
| **Throughput (+)** | `+0.20 * cars_cleared` | Incentivizes active vehicle flow. |
| **Density (-)** | `-0.40 * total_congestion` | Penalizes letting the intersection fill up. |
| **Bottleneck (-)** | `-0.15 * max_queue` | Discourages extreme build-up in any single lane. |
| **Stability (-)** | `-switch_penalty` | Prevents "flickering" and promotes signal stability. |
| **Fairness (+/-)** | `+0.10` bonus / `-penalty` | Rewards balanced service; penalizes starvation. |
| **Emergency (🚨)** | `Golden Window` Bonus | Massive reward for clearing EVs within target steps. |
| **EV Delay (-)** | `Exponential Penalty` | Punishes agents for delaying life-saving vehicles. |

---

## 📊 Evaluation Metrics

We track **8 key performance indicators** per episode to ensure a winning submission can be quantified:

1.  **Total Cleared**: Raw efficiency metric.
2.  **Avg Waiting Time**: The "commuter frustration" index.
3.  **Max Queue Length**: Gauges system robustness against bottlenecks.
4.  **Signal Switch Count**: Measures policy stability.
5.  **Congestion Score**: Final system state snapshot.
6.  **Avg EV Clear Time**: Critical safety metric (lower is better).
7.  **Fairness Score**: [0, 1] index — how equally did we serve all lanes?
8.  **Total EV Penalty**: Measures total failure to prioritize safety.

---

## ⚡ Task Difficulty Levels

| Parameter | Easy | Medium | Hard |
| :--- | :--- | :--- | :--- |
| **Arrival Rate** | 0–1 | 1–3 | 2–5 |
| **Discharge Rate** | 4–5 | 3–5 | 2–4 |
| **Burst Frequency** | 0% | 10% | 20% |
| **Emergency Prob** | 1% | 5% | 15% |
| **EV Golden Window** | 8 steps | 5 steps | 3 steps |
| **Fairness Limit** | 20 steps | 15 steps | 10 steps |

---

## 🚑 Emergency & Fairness Logic

### The "Golden Window"
When an Emergency Vehicle (EV) appears, the agent is granted a bonus if it switches and clears the lane within the **Golden Window** (defined per difficulty). Failing to do so triggers an **exponential delay penalty**, simulating the real-world cost of stopping an ambulance or fire truck.

### Fairness Guard
To prevent "Starvation" (where the agent ignores a low-traffic lane to optimize throughput on a high-traffic lane), a **Fairness Score** is calculated. If a lane remains red beyond the **Starvation Limit**, the agent suffers a heavy penalty. This forces the agent to learn the complex trade-off between total throughput and social fairness.

---

## 🚶 Step Walkthrough

```text
Step 12:  🚨 Ambulance detected in East lane (currently RED).
          - EW Queue: 4, EV Timer: 0
          - Agent receives p_emergency penalty.

Step 13:  Agent Action: 1 (SWITCH to EW).
          - Switch penalty applied (-0.20).
          - NS lanes stop; EW lanes turn GREEN.

Step 14:  EV Cleared!
          - EV Clear Time: 2 steps.
          - Agent receives r_ev_bonus (+0.25) for "Golden Window" clearance.
          - Total cleared (+0.60 reward).
```

---

## 🔮 Future Improvements

- **Multi-Intersection Coordination**: Extending to a grid of agents using MARL.
- **Pedestrian Logic**: Adding crosswalks and pedestrian priority.
- **V2X Communication**: Providing agents with ahead-of-time traffic predictions.

---

## 📜 License

MIT © 2026 Meta x PyTorch OpenEnv Hackathon