traffic-visualizer / README.md
tokev's picture
Add files using upload-large-folder tool
5893134 verified
---
title: Agentic Traffic
emoji: 🏢
colorFrom: green
colorTo: purple
sdk: docker
pinned: false
short_description: Agentic AI to control traffic lights
app_port: 7860
---
# traffic-llm
CityFlow-based traffic-control project with intersection-level multi-agent DQN training and district-aware policy variants.
Full model weights and files can be found here: https://huggingface.co/Aditya2162/agentic-traffic
## OpenEnv UI
For the deployed OpenEnv web interface:
- Click `Reset` before using `Step`.
- Leave `Use Llm` unchecked for the fast, stable DQN-only path.
- Use `District Actions` = `{}` for a valid no-op step payload.
- Only enable `Use Llm` when you explicitly want district-level LLM guidance on top of the DQN executor.
## Training
The default local-policy trainer now uses parameter-shared dueling Double DQN with prioritized replay and n-step returns:
```bash
python3 -m training.train_local_policy train
```
That trains against `data/generated`, uses `data/splits`, writes checkpoints to `artifacts/dqn_shared`, enables TensorBoard logging, uses parallel CPU rollout workers by default, shows `tqdm` progress bars, and now validates plus checkpoints every 40 updates by default.
For a broader but still manageable validation pass:
```bash
python3 -m training.train_local_policy train --max-val-cities 3 --val-scenarios-per-city 7
```
That evaluates 3 validation cities across all 7 scenario types. This gives 21 learned-policy validation episodes per eval, or 63 total episodes if random and fixed baselines are also enabled.
Phase-3-style full training with the same 40-update eval/checkpoint cadence:
```bash
python3 -m training.train_local_policy train \
--max-train-cities 70 \
--max-val-cities 3 \
--val-scenarios-per-city 7 \
--policy-arch single_head_with_district_feature \
--reward-variant wait_queue_throughput
```
Useful ablations:
```bash
python3 -m training.train_local_policy train --policy-arch multi_head --reward-variant current
python3 -m training.train_local_policy train --policy-arch single_head --reward-variant current
python3 -m training.train_local_policy train --policy-arch single_head_with_district_feature --reward-variant wait_queue_throughput
```
For a fast phase-1 overfit run on one fixed world:
```bash
python3 -m training.train_local_policy train \
--total-updates 25 \
--train-city-id city_0072 \
--train-scenario-name normal \
--overfit-val-on-train-scenario \
--fast-overfit \
--policy-arch single_head_with_district_feature \
--reward-variant wait_queue_throughput
```
To create or refresh dataset splits:
```bash
python3 -m training.train_local_policy make-splits
```
To evaluate the best checkpoint:
```bash
python3 -m training.train_local_policy evaluate \
--checkpoint artifacts/dqn_shared/best_validation.pt \
--split val
```
To evaluate a heuristic baseline directly:
```bash
python3 -m training.train_local_policy evaluate --baseline queue_greedy --split val
```
## TensorBoard
TensorBoard logs are written to `artifacts/dqn_shared/tensorboard` by default.
```bash
tensorboard --logdir artifacts/dqn_shared/tensorboard
```
## District LLM
The district LLM stack lives under `district_llm/`. It treats the learned DQN local controller as the low-level executor, derives district-scale SFT labels automatically from DQN rollout windows, and defaults district-model fine-tuning to DQN-derived rows only.
Generate district-LLM data from a learned checkpoint:
```bash
python3 -m district_llm.generate_dataset \
--controller rl_checkpoint \
--checkpoint artifacts/dqn_shared/best_validation.pt \
--episodes 100 \
--decision-interval 10 \
--use-checkpoint-env-config \
--output data/district_llm_train.jsonl
```
Generate from fixed or heuristic baselines:
```bash
python3 -m district_llm.generate_dataset --controller fixed --episodes 50 --decision-interval 10 --output data/district_llm_fixed.jsonl
python3 -m district_llm.generate_dataset --controller queue_greedy --episodes 50 --decision-interval 10 --output data/district_llm_heuristic.jsonl
python3 -m district_llm.generate_dataset --teacher-spec fixed --teacher-spec random --episodes 50 --decision-interval 10 --output data/district_llm_multi_teacher.jsonl
```
Train a first-pass district model with Unsloth/QLoRA:
```bash
python3 -m training.train_district_llm \
--dataset data/district_llm_train.jsonl \
--output-dir artifacts/district_llm_qwen \
--model-name Qwen/Qwen2.5-7B-Instruct \
--load-in-4bit \
--lora-rank 16 \
--max-seq-length 1024 \
--max-steps 1000
```
Run single-sample inference:
```bash
python3 -m district_llm.inference \
--model artifacts/district_llm_qwen \
--city-id city_0006 \
--scenario-name accident \
--district-id d_00
```
Run the OpenEnv-compatible district wrapper on top of the current DQN stack:
```bash
uvicorn openenv_app.app:app --reload
```
## Algorithm
- Training algorithm: parameter-shared dueling Double DQN.
- Replay: prioritized replay over per-intersection transitions gathered from full CityFlow worlds.
- Return target: n-step bootstrap target with target-network updates.
- Execution: all controllable intersections act simultaneously every RL decision interval.
- Action space: `0 = hold current phase`, `1 = switch to next green phase`.
- Safety: `min_green_time` is enforced in the environment and exposed through action masking.
Policy architecture modes:
- `multi_head`: shared trunk with district-type-specific Q heads.
- `single_head`: one shared Q head for all intersections, with district type removed from the observation.
- `single_head_with_district_feature`: one shared Q head for all intersections, with district type left in the observation as an explicit feature.
Reward variants:
- `current`: backward-compatible waiting and queue penalty.
- `normalized_wait_queue`: normalized queue and waiting reduction reward.
- `wait_queue_throughput`: normalized queue/wait reduction plus throughput bonus and imbalance penalty.
## Smoke Test
To sanity-check one generated scenario with the real CityFlow environment:
```bash
python3 scripts/smoke_test_env.py --city-id city_0001 --scenario-name normal --policy random
```
## Project layout
- `agents/`: heuristic local policies and simple baselines.
- `env/`: CityFlow environment, topology parsing, observation building, and reward logic.
- `training/`: dataset utilities, replay-based DQN training, evaluation helpers, TensorBoard logging, and CLIs.
- `data/`: generated synthetic cities, split files, and dataset generation utilities.
- `scripts/`: utility scripts, including the CityFlow smoke test.
- `third_party/`: vendored dependencies, including CityFlow source.
## Notes
- The generated dataset is assumed to already exist under `data/generated`.
- District membership comes from `district_map.json`.
- District types come from `metadata.json`.
- Runtime training and evaluation require the `cityflow` Python module to be installed in the active environment.