| --- |
| title: Traffic Signal Optimization — OpenEnv Elite |
| emoji: 🚦 |
| colorFrom: blue |
| colorTo: green |
| sdk: docker |
| app_port: 7860 |
| pinned: false |
| --- |
| |
| # 🚥 Traffic Signal Optimization — OpenEnv Elite |
|
|
| > **Meta × PyTorch OpenEnv Hackathon Submission** |
| > |
| > A world-class Reinforcement Learning environment for urban traffic control, featuring stochastic multi-lane dynamics, emergency vehicle prioritization, and sophisticated fairness-driven rewards. |
|
|
| --- |
|
|
| ## 🏗️ Problem Statement |
|
|
| Fixed-cycle traffic signals are a relic of the past. In modern urban environments, they create **needless congestion**, increase **CO2 emissions**, and — most critically — cause **life-threatening delays** for emergency vehicles. |
|
|
| This project provides a high-fidelity 4-way intersection simulation designed for OpenEnv. It challenges RL agents to move beyond simple throughput and master the art of **dynamic balancing**: serving high-demand lanes while maintaining fairness for low-traffic directions and clearing "Golden Windows" for emergency responders. |
|
|
| --- |
|
|
| ## 🚀 Quick Start |
|
|
| ```bash |
| # Run the complete suite: Simulation + Sanity Checks + Comparison |
| python test_env.py |
| |
| # Run a specific high-intensity scenario |
| python test_env.py hard |
| ``` |
|
|
| ```python |
| from env import TrafficEnv |
| from tasks import get_config |
| from baseline_agent import RuleBasedAgent |
| |
| # 1. Load a structured difficulty profile |
| config = get_config("medium") |
| env = TrafficEnv(config) |
| |
| # 2. Initialize our sophisticated Rule-Based Controller |
| agent = RuleBasedAgent() |
| |
| state = env.reset() |
| done = False |
| |
| while not done: |
| action = agent.select_action(state) |
| state, reward, done, info = env.step(action) |
| |
| print(f"Total Cleared: {info['total_cleared']}") |
| print(f"Fairness Index: {info['fairness_score']:.2f}") |
| ``` |
|
|
| --- |
|
|
| ## 🧠 Environment Design Philosophy |
|
|
| ### State Space |
| The environment exposes a **14-dimensional** continuous observation vector, providing the agent with full situational awareness: |
| - **Queues (4)**: Exact vehicle count per lane [N, S, E, W]. |
| - **Wait Pressure (4)**: Cumulative "impatience" score per lane. |
| - **Emergency Flags (4)**: Binary detection of EVs per lane. |
| - **Signal State (2)**: Current phase [0=NS, 1=EW] and step count. |
|
|
| ### Action Space |
| - `0`: **Maintain** — keep the current green phase. |
| - `1`: **Switch** — transition the signal (includes yellow-phase discharge friction). |
|
|
| --- |
|
|
| ## 💎 Reward Engineering (The "Judge's Choice") |
|
|
| Our reward function is the core of this submission. It isn't just a count; it's a **multi-objective ethical framework** clipped to `[-1, 1]`: |
|
|
| | Component | Logic | Purpose | |
| | :--- | :--- | :--- | |
| | **Throughput (+)** | `+0.20 * cars_cleared` | Incentivizes active vehicle flow. | |
| | **Density (-)** | `-0.40 * total_congestion` | Penalizes letting the intersection fill up. | |
| | **Bottleneck (-)** | `-0.15 * max_queue` | Discourages extreme build-up in any single lane. | |
| | **Stability (-)** | `-switch_penalty` | Prevents "flickering" and promotes signal stability. | |
| | **Fairness (+/-)** | `+0.10` bonus / `-penalty` | Rewards balanced service; penalizes starvation. | |
| | **Emergency (🚨)** | `Golden Window` Bonus | Massive reward for clearing EVs within target steps. | |
| | **EV Delay (-)** | `Exponential Penalty` | Punishes agents for delaying life-saving vehicles. | |
|
|
| --- |
|
|
| ## 📊 Evaluation Metrics |
|
|
| We track **8 key performance indicators** per episode to ensure a winning submission can be quantified: |
|
|
| 1. **Total Cleared**: Raw efficiency metric. |
| 2. **Avg Waiting Time**: The "commuter frustration" index. |
| 3. **Max Queue Length**: Gauges system robustness against bottlenecks. |
| 4. **Signal Switch Count**: Measures policy stability. |
| 5. **Congestion Score**: Final system state snapshot. |
| 6. **Avg EV Clear Time**: Critical safety metric (lower is better). |
| 7. **Fairness Score**: [0, 1] index — how equally did we serve all lanes? |
| 8. **Total EV Penalty**: Measures total failure to prioritize safety. |
|
|
| --- |
|
|
| ## ⚡ Task Difficulty Levels |
|
|
| | Parameter | Easy | Medium | Hard | |
| | :--- | :--- | :--- | :--- | |
| | **Arrival Rate** | 0–1 | 1–3 | 2–5 | |
| | **Discharge Rate** | 4–5 | 3–5 | 2–4 | |
| | **Burst Frequency** | 0% | 10% | 20% | |
| | **Emergency Prob** | 1% | 5% | 15% | |
| | **EV Golden Window** | 8 steps | 5 steps | 3 steps | |
| | **Fairness Limit** | 20 steps | 15 steps | 10 steps | |
|
|
| --- |
|
|
| ## 🚑 Emergency & Fairness Logic |
|
|
| ### The "Golden Window" |
| When an Emergency Vehicle (EV) appears, the agent is granted a bonus if it switches and clears the lane within the **Golden Window** (defined per difficulty). Failing to do so triggers an **exponential delay penalty**, simulating the real-world cost of stopping an ambulance or fire truck. |
|
|
| ### Fairness Guard |
| To prevent "Starvation" (where the agent ignores a low-traffic lane to optimize throughput on a high-traffic lane), a **Fairness Score** is calculated. If a lane remains red beyond the **Starvation Limit**, the agent suffers a heavy penalty. This forces the agent to learn the complex trade-off between total throughput and social fairness. |
|
|
| --- |
|
|
| ## 🚶 Step Walkthrough |
|
|
| ```text |
| Step 12: 🚨 Ambulance detected in East lane (currently RED). |
| - EW Queue: 4, EV Timer: 0 |
| - Agent receives p_emergency penalty. |
| |
| Step 13: Agent Action: 1 (SWITCH to EW). |
| - Switch penalty applied (-0.20). |
| - NS lanes stop; EW lanes turn GREEN. |
| |
| Step 14: EV Cleared! |
| - EV Clear Time: 2 steps. |
| - Agent receives r_ev_bonus (+0.25) for "Golden Window" clearance. |
| - Total cleared (+0.60 reward). |
| ``` |
|
|
| --- |
|
|
| ## 🔮 Future Improvements |
|
|
| - **Multi-Intersection Coordination**: Extending to a grid of agents using MARL. |
| - **Pedestrian Logic**: Adding crosswalks and pedestrian priority. |
| - **V2X Communication**: Providing agents with ahead-of-time traffic predictions. |
|
|
| --- |
|
|
| ## 📜 License |
|
|
| MIT © 2026 Meta x PyTorch OpenEnv Hackathon |
|
|