File size: 7,201 Bytes
5893134
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
---

title: Agentic Traffic
emoji: 🏢
colorFrom: green
colorTo: purple
sdk: docker
pinned: false
short_description: Agentic AI to control traffic lights
app_port: 7860
---


# traffic-llm

CityFlow-based traffic-control project with intersection-level multi-agent DQN training and district-aware policy variants.

Full model weights and files can be found here: https://huggingface.co/Aditya2162/agentic-traffic

## OpenEnv UI

For the deployed OpenEnv web interface:

- Click `Reset` before using `Step`.
- Leave `Use Llm` unchecked for the fast, stable DQN-only path.
- Use `District Actions` = `{}` for a valid no-op step payload.
- Only enable `Use Llm` when you explicitly want district-level LLM guidance on top of the DQN executor.

## Training

The default local-policy trainer now uses parameter-shared dueling Double DQN with prioritized replay and n-step returns:

```bash

python3 -m training.train_local_policy train

```

That trains against `data/generated`, uses `data/splits`, writes checkpoints to `artifacts/dqn_shared`, enables TensorBoard logging, uses parallel CPU rollout workers by default, shows `tqdm` progress bars, and now validates plus checkpoints every 40 updates by default.

For a broader but still manageable validation pass:

```bash

python3 -m training.train_local_policy train --max-val-cities 3 --val-scenarios-per-city 7

```

That evaluates 3 validation cities across all 7 scenario types. This gives 21 learned-policy validation episodes per eval, or 63 total episodes if random and fixed baselines are also enabled.

Phase-3-style full training with the same 40-update eval/checkpoint cadence:

```bash

python3 -m training.train_local_policy train \

  --max-train-cities 70 \

  --max-val-cities 3 \

  --val-scenarios-per-city 7 \

  --policy-arch single_head_with_district_feature \

  --reward-variant wait_queue_throughput

```

Useful ablations:

```bash

python3 -m training.train_local_policy train --policy-arch multi_head --reward-variant current

python3 -m training.train_local_policy train --policy-arch single_head --reward-variant current

python3 -m training.train_local_policy train --policy-arch single_head_with_district_feature --reward-variant wait_queue_throughput

```

For a fast phase-1 overfit run on one fixed world:

```bash

python3 -m training.train_local_policy train \

  --total-updates 25 \

  --train-city-id city_0072 \

  --train-scenario-name normal \

  --overfit-val-on-train-scenario \

  --fast-overfit \

  --policy-arch single_head_with_district_feature \

  --reward-variant wait_queue_throughput

```

To create or refresh dataset splits:

```bash

python3 -m training.train_local_policy make-splits

```

To evaluate the best checkpoint:

```bash

python3 -m training.train_local_policy evaluate \

  --checkpoint artifacts/dqn_shared/best_validation.pt \

  --split val

```

To evaluate a heuristic baseline directly:

```bash

python3 -m training.train_local_policy evaluate --baseline queue_greedy --split val

```

## TensorBoard

TensorBoard logs are written to `artifacts/dqn_shared/tensorboard` by default.

```bash

tensorboard --logdir artifacts/dqn_shared/tensorboard

```

## District LLM

The district LLM stack lives under `district_llm/`. It treats the learned DQN local controller as the low-level executor, derives district-scale SFT labels automatically from DQN rollout windows, and defaults district-model fine-tuning to DQN-derived rows only.

Generate district-LLM data from a learned checkpoint:

```bash

python3 -m district_llm.generate_dataset \

  --controller rl_checkpoint \

  --checkpoint artifacts/dqn_shared/best_validation.pt \

  --episodes 100 \

  --decision-interval 10 \

  --use-checkpoint-env-config \

  --output data/district_llm_train.jsonl

```

Generate from fixed or heuristic baselines:

```bash

python3 -m district_llm.generate_dataset --controller fixed --episodes 50 --decision-interval 10 --output data/district_llm_fixed.jsonl

python3 -m district_llm.generate_dataset --controller queue_greedy --episodes 50 --decision-interval 10 --output data/district_llm_heuristic.jsonl

python3 -m district_llm.generate_dataset --teacher-spec fixed --teacher-spec random --episodes 50 --decision-interval 10 --output data/district_llm_multi_teacher.jsonl

```

Train a first-pass district model with Unsloth/QLoRA:

```bash

python3 -m training.train_district_llm \

  --dataset data/district_llm_train.jsonl \

  --output-dir artifacts/district_llm_qwen \

  --model-name Qwen/Qwen2.5-7B-Instruct \

  --load-in-4bit \

  --lora-rank 16 \

  --max-seq-length 1024 \

  --max-steps 1000

```

Run single-sample inference:

```bash

python3 -m district_llm.inference \

  --model artifacts/district_llm_qwen \

  --city-id city_0006 \

  --scenario-name accident \

  --district-id d_00

```

Run the OpenEnv-compatible district wrapper on top of the current DQN stack:

```bash

uvicorn openenv_app.app:app --reload

```

## Algorithm

- Training algorithm: parameter-shared dueling Double DQN.
- Replay: prioritized replay over per-intersection transitions gathered from full CityFlow worlds.
- Return target: n-step bootstrap target with target-network updates.
- Execution: all controllable intersections act simultaneously every RL decision interval.
- Action space: `0 = hold current phase`, `1 = switch to next green phase`.
- Safety: `min_green_time` is enforced in the environment and exposed through action masking.

Policy architecture modes:

- `multi_head`: shared trunk with district-type-specific Q heads.
- `single_head`: one shared Q head for all intersections, with district type removed from the observation.
- `single_head_with_district_feature`: one shared Q head for all intersections, with district type left in the observation as an explicit feature.

Reward variants:

- `current`: backward-compatible waiting and queue penalty.
- `normalized_wait_queue`: normalized queue and waiting reduction reward.
- `wait_queue_throughput`: normalized queue/wait reduction plus throughput bonus and imbalance penalty.

## Smoke Test

To sanity-check one generated scenario with the real CityFlow environment:

```bash

python3 scripts/smoke_test_env.py --city-id city_0001 --scenario-name normal --policy random

```

## Project layout

- `agents/`: heuristic local policies and simple baselines.
- `env/`: CityFlow environment, topology parsing, observation building, and reward logic.
- `training/`: dataset utilities, replay-based DQN training, evaluation helpers, TensorBoard logging, and CLIs.
- `data/`: generated synthetic cities, split files, and dataset generation utilities.
- `scripts/`: utility scripts, including the CityFlow smoke test.
- `third_party/`: vendored dependencies, including CityFlow source.

## Notes

- The generated dataset is assumed to already exist under `data/generated`.
- District membership comes from `district_map.json`.
- District types come from `metadata.json`.
- Runtime training and evaluation require the `cityflow` Python module to be installed in the active environment.