Spaces:

Dhaerya
/

Traffic-Control

Sleeping

App Files Files Community

Traffic-Control / Architecture.md

Dhaerya

Add files

b00d5d5 about 1 month ago

preview code

raw

history blame contribute delete

4.41 kB

Traffic Control Reinforcement Learning Architecture

This document provides a comprehensive A-Z breakdown of the Reinforcement Learning (RL) architecture implemented in the Traffic Signal Control project.

The system replaces traditional fixed-timing traffic lights with intelligent, adaptive agents that learn optimal signal switching policies. It features a full-stack React and FastAPI Web Dashboard to visualize the intersection in real-time.

1. High-Level System Flow

The architecture follows the standard Reinforcement Learning MDP (Markov Decision Process) loop:

Observe: The Agent receives the current State (queue lengths for 8 lanes, current light phase) from the Environment.
Decide: The Agent selects an Action (keep the current phase OR switch to the next phase).
Act: The Environment executes the action, simulating traffic flow for a specific duration.
Learn: The Environment returns a Reward (penalty based on total waiting traffic). The Agent uses this to update its neural network.

2. The Environment (`TrafficEnvironment`)

The environment is the simulated 8-lane intersection built using the Gymnasium API.

2.1 State Space (Observation)

We use a continuous 9-dimensional state vector to perfectly track Straight/Right (SR) and Left-Turn (L) queues:

[N_SR, N_L, E_SR, E_L, S_SR, S_L, W_SR, W_L, current_phase]
Absolute Normalization: Queue lengths are normalized linearly by dividing by 20.0 and clipping to [0, 1]. This ensures the neural network correctly perceives absolute traffic volume.
Phase Representation: The current phase (0 to 3) is normalized to phase / 3.0.

2.2 Action Space & 4-Phase Cycle

The agent has a discrete action space of size 2 (Keep or Switch). It cycles through 4 Directional Phases to completely eliminate turning collisions:

Phase 0: North Green (Straight + Left)
Phase 1: East Green (Straight + Left)
Phase 2: South Green (Straight + Left)
Phase 3: West Green (Straight + Left)

2.3 Reward Function

Calculation: reward = -(Total Queue Length) / 20.0
Clipping: The reward is hard-clipped to [-1.0, 1.0].
Sensitivity: Dividing by 20.0 ensures the agent receives strong gradient signals even in low-density traffic conditions, forcing it to actively clear small queues.

3. Traffic Generation (`TrafficGenerator`)

Realistic Volume: The traffic density is tuned to produce roughly 2,000–4,000 total arrivals per episode.
Left-Turn Probabilities: Roughly 20% of generated traffic is routed into the dedicated Left-Turn queues.
Stochastic Bursts: There is a 15% probability that a sudden "burst" of traffic will arrive in a random lane, testing the agent's adaptability.

4. The Deep Q-Network Agent (`DQNAgent`)

A modern Deep RL approach utilizing PyTorch.

Neural Network Architecture: Input Layer (9) -> Hidden Layer (256, ReLU) -> Hidden Layer (256, ReLU) -> Output Layer (2, Keep/Switch)
Experience Replay Buffer: Transitions (s, a, r, s', done) are stored in a circular buffer (size: 50,000). The network trains by sampling random mini-batches (size 256).
Target Network: Uses an Online and Target network synced every 10 episodes to stabilize training.
Hardware Acceleration: Automatically utilizes CUDA (NVIDIA GPUs) for accelerated tensor operations.

5. Web Dashboard Architecture

The project features a beautiful full-stack simulation dashboard.

5.1 Backend (`backend/app.py`)

Built with FastAPI.
Loads the fully trained dqn_best.pth model into memory.
Exposes /api/reset and /api/step endpoints.
When the frontend calls /api/step, the backend asks the DQN agent for an action, steps the Python Gymnasium environment, and returns the 9D State and reward back to the UI.

5.2 Frontend (`frontend/src/App.jsx`)

Built with React and Vite.
Ultra-Premium Aesthetics: Features a dark glassmorphism UI, a glowing neon background, and etched asphalt road textures.
Live Telemetry: Tracks total throughput and RL reward signals in real-time.
CSS Animations: Cars are dynamically rendered in all 8 lanes and visually animate driving straight or turning left (-90deg rotation) when their specific directional light turns green.

Traffic Control Reinforcement Learning Architecture

1. High-Level System Flow

2. The Environment (TrafficEnvironment)

2.1 State Space (Observation)

2.2 Action Space & 4-Phase Cycle

2.3 Reward Function

3. Traffic Generation (TrafficGenerator)

4. The Deep Q-Network Agent (DQNAgent)