Spaces:
Sleeping
Sleeping
| # Traffic Control Reinforcement Learning Architecture | |
| This document provides a comprehensive A-Z breakdown of the Reinforcement Learning (RL) architecture implemented in the Traffic Signal Control project. | |
| The system replaces traditional fixed-timing traffic lights with intelligent, adaptive agents that learn optimal signal switching policies. **It features a full-stack React and FastAPI Web Dashboard** to visualize the intersection in real-time. | |
| --- | |
| ## 1. High-Level System Flow | |
| The architecture follows the standard Reinforcement Learning MDP (Markov Decision Process) loop: | |
| 1. **Observe**: The Agent receives the current **State** (queue lengths for 8 lanes, current light phase) from the Environment. | |
| 2. **Decide**: The Agent selects an **Action** (keep the current phase OR switch to the next phase). | |
| 3. **Act**: The Environment executes the action, simulating traffic flow for a specific duration. | |
| 4. **Learn**: The Environment returns a **Reward** (penalty based on total waiting traffic). The Agent uses this to update its neural network. | |
| --- | |
| ## 2. The Environment (`TrafficEnvironment`) | |
| The environment is the simulated 8-lane intersection built using the `Gymnasium` API. | |
| ### 2.1 State Space (Observation) | |
| We use a continuous **9-dimensional state vector** to perfectly track Straight/Right (SR) and Left-Turn (L) queues: | |
| * `[N_SR, N_L, E_SR, E_L, S_SR, S_L, W_SR, W_L, current_phase]` | |
| * **Absolute Normalization**: Queue lengths are normalized linearly by dividing by `20.0` and clipping to `[0, 1]`. This ensures the neural network correctly perceives absolute traffic volume. | |
| * **Phase Representation**: The current phase (0 to 3) is normalized to `phase / 3.0`. | |
| ### 2.2 Action Space & 4-Phase Cycle | |
| The agent has a discrete action space of size 2 (Keep or Switch). It cycles through **4 Directional Phases** to completely eliminate turning collisions: | |
| * `Phase 0`: North Green (Straight + Left) | |
| * `Phase 1`: East Green (Straight + Left) | |
| * `Phase 2`: South Green (Straight + Left) | |
| * `Phase 3`: West Green (Straight + Left) | |
| ### 2.3 Reward Function | |
| * **Calculation**: `reward = -(Total Queue Length) / 20.0` | |
| * **Clipping**: The reward is hard-clipped to `[-1.0, 1.0]`. | |
| * **Sensitivity**: Dividing by 20.0 ensures the agent receives strong gradient signals even in low-density traffic conditions, forcing it to actively clear small queues. | |
| --- | |
| ## 3. Traffic Generation (`TrafficGenerator`) | |
| * **Realistic Volume**: The traffic density is tuned to produce roughly 2,000–4,000 total arrivals per episode. | |
| * **Left-Turn Probabilities**: Roughly 20% of generated traffic is routed into the dedicated Left-Turn queues. | |
| * **Stochastic Bursts**: There is a 15% probability that a sudden "burst" of traffic will arrive in a random lane, testing the agent's adaptability. | |
| --- | |
| ## 4. The Deep Q-Network Agent (`DQNAgent`) | |
| A modern Deep RL approach utilizing PyTorch. | |
| * **Neural Network Architecture**: | |
| `Input Layer (9)` -> `Hidden Layer (256, ReLU)` -> `Hidden Layer (256, ReLU)` -> `Output Layer (2, Keep/Switch)` | |
| * **Experience Replay Buffer**: Transitions `(s, a, r, s', done)` are stored in a circular buffer (size: 50,000). The network trains by sampling random mini-batches (size 256). | |
| * **Target Network**: Uses an Online and Target network synced every 10 episodes to stabilize training. | |
| * **Hardware Acceleration**: Automatically utilizes CUDA (NVIDIA GPUs) for accelerated tensor operations. | |
| --- | |
| ## 5. Web Dashboard Architecture | |
| The project features a beautiful full-stack simulation dashboard. | |
| ### 5.1 Backend (`backend/app.py`) | |
| * Built with **FastAPI**. | |
| * Loads the fully trained `dqn_best.pth` model into memory. | |
| * Exposes `/api/reset` and `/api/step` endpoints. | |
| * When the frontend calls `/api/step`, the backend asks the DQN agent for an action, steps the Python Gymnasium environment, and returns the 9D State and reward back to the UI. | |
| ### 5.2 Frontend (`frontend/src/App.jsx`) | |
| * Built with **React and Vite**. | |
| * **Ultra-Premium Aesthetics**: Features a dark glassmorphism UI, a glowing neon background, and etched asphalt road textures. | |
| * **Live Telemetry**: Tracks total throughput and RL reward signals in real-time. | |
| * **CSS Animations**: Cars are dynamically rendered in all 8 lanes and visually animate driving straight or turning left (-90deg rotation) when their specific directional light turns green. | |