Spaces:
Sleeping
Sleeping
File size: 7,201 Bytes
5893134 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 | ---
title: Agentic Traffic
emoji: 🏢
colorFrom: green
colorTo: purple
sdk: docker
pinned: false
short_description: Agentic AI to control traffic lights
app_port: 7860
---
# traffic-llm
CityFlow-based traffic-control project with intersection-level multi-agent DQN training and district-aware policy variants.
Full model weights and files can be found here: https://huggingface.co/Aditya2162/agentic-traffic
## OpenEnv UI
For the deployed OpenEnv web interface:
- Click `Reset` before using `Step`.
- Leave `Use Llm` unchecked for the fast, stable DQN-only path.
- Use `District Actions` = `{}` for a valid no-op step payload.
- Only enable `Use Llm` when you explicitly want district-level LLM guidance on top of the DQN executor.
## Training
The default local-policy trainer now uses parameter-shared dueling Double DQN with prioritized replay and n-step returns:
```bash
python3 -m training.train_local_policy train
```
That trains against `data/generated`, uses `data/splits`, writes checkpoints to `artifacts/dqn_shared`, enables TensorBoard logging, uses parallel CPU rollout workers by default, shows `tqdm` progress bars, and now validates plus checkpoints every 40 updates by default.
For a broader but still manageable validation pass:
```bash
python3 -m training.train_local_policy train --max-val-cities 3 --val-scenarios-per-city 7
```
That evaluates 3 validation cities across all 7 scenario types. This gives 21 learned-policy validation episodes per eval, or 63 total episodes if random and fixed baselines are also enabled.
Phase-3-style full training with the same 40-update eval/checkpoint cadence:
```bash
python3 -m training.train_local_policy train \
--max-train-cities 70 \
--max-val-cities 3 \
--val-scenarios-per-city 7 \
--policy-arch single_head_with_district_feature \
--reward-variant wait_queue_throughput
```
Useful ablations:
```bash
python3 -m training.train_local_policy train --policy-arch multi_head --reward-variant current
python3 -m training.train_local_policy train --policy-arch single_head --reward-variant current
python3 -m training.train_local_policy train --policy-arch single_head_with_district_feature --reward-variant wait_queue_throughput
```
For a fast phase-1 overfit run on one fixed world:
```bash
python3 -m training.train_local_policy train \
--total-updates 25 \
--train-city-id city_0072 \
--train-scenario-name normal \
--overfit-val-on-train-scenario \
--fast-overfit \
--policy-arch single_head_with_district_feature \
--reward-variant wait_queue_throughput
```
To create or refresh dataset splits:
```bash
python3 -m training.train_local_policy make-splits
```
To evaluate the best checkpoint:
```bash
python3 -m training.train_local_policy evaluate \
--checkpoint artifacts/dqn_shared/best_validation.pt \
--split val
```
To evaluate a heuristic baseline directly:
```bash
python3 -m training.train_local_policy evaluate --baseline queue_greedy --split val
```
## TensorBoard
TensorBoard logs are written to `artifacts/dqn_shared/tensorboard` by default.
```bash
tensorboard --logdir artifacts/dqn_shared/tensorboard
```
## District LLM
The district LLM stack lives under `district_llm/`. It treats the learned DQN local controller as the low-level executor, derives district-scale SFT labels automatically from DQN rollout windows, and defaults district-model fine-tuning to DQN-derived rows only.
Generate district-LLM data from a learned checkpoint:
```bash
python3 -m district_llm.generate_dataset \
--controller rl_checkpoint \
--checkpoint artifacts/dqn_shared/best_validation.pt \
--episodes 100 \
--decision-interval 10 \
--use-checkpoint-env-config \
--output data/district_llm_train.jsonl
```
Generate from fixed or heuristic baselines:
```bash
python3 -m district_llm.generate_dataset --controller fixed --episodes 50 --decision-interval 10 --output data/district_llm_fixed.jsonl
python3 -m district_llm.generate_dataset --controller queue_greedy --episodes 50 --decision-interval 10 --output data/district_llm_heuristic.jsonl
python3 -m district_llm.generate_dataset --teacher-spec fixed --teacher-spec random --episodes 50 --decision-interval 10 --output data/district_llm_multi_teacher.jsonl
```
Train a first-pass district model with Unsloth/QLoRA:
```bash
python3 -m training.train_district_llm \
--dataset data/district_llm_train.jsonl \
--output-dir artifacts/district_llm_qwen \
--model-name Qwen/Qwen2.5-7B-Instruct \
--load-in-4bit \
--lora-rank 16 \
--max-seq-length 1024 \
--max-steps 1000
```
Run single-sample inference:
```bash
python3 -m district_llm.inference \
--model artifacts/district_llm_qwen \
--city-id city_0006 \
--scenario-name accident \
--district-id d_00
```
Run the OpenEnv-compatible district wrapper on top of the current DQN stack:
```bash
uvicorn openenv_app.app:app --reload
```
## Algorithm
- Training algorithm: parameter-shared dueling Double DQN.
- Replay: prioritized replay over per-intersection transitions gathered from full CityFlow worlds.
- Return target: n-step bootstrap target with target-network updates.
- Execution: all controllable intersections act simultaneously every RL decision interval.
- Action space: `0 = hold current phase`, `1 = switch to next green phase`.
- Safety: `min_green_time` is enforced in the environment and exposed through action masking.
Policy architecture modes:
- `multi_head`: shared trunk with district-type-specific Q heads.
- `single_head`: one shared Q head for all intersections, with district type removed from the observation.
- `single_head_with_district_feature`: one shared Q head for all intersections, with district type left in the observation as an explicit feature.
Reward variants:
- `current`: backward-compatible waiting and queue penalty.
- `normalized_wait_queue`: normalized queue and waiting reduction reward.
- `wait_queue_throughput`: normalized queue/wait reduction plus throughput bonus and imbalance penalty.
## Smoke Test
To sanity-check one generated scenario with the real CityFlow environment:
```bash
python3 scripts/smoke_test_env.py --city-id city_0001 --scenario-name normal --policy random
```
## Project layout
- `agents/`: heuristic local policies and simple baselines.
- `env/`: CityFlow environment, topology parsing, observation building, and reward logic.
- `training/`: dataset utilities, replay-based DQN training, evaluation helpers, TensorBoard logging, and CLIs.
- `data/`: generated synthetic cities, split files, and dataset generation utilities.
- `scripts/`: utility scripts, including the CityFlow smoke test.
- `third_party/`: vendored dependencies, including CityFlow source.
## Notes
- The generated dataset is assumed to already exist under `data/generated`.
- District membership comes from `district_map.json`.
- District types come from `metadata.json`.
- Runtime training and evaluation require the `cityflow` Python module to be installed in the active environment.
|