Spaces:
Running
title: Agentic Traffic
emoji: 🏢
colorFrom: green
colorTo: purple
sdk: docker
pinned: false
short_description: Agentic AI to control traffic lights
app_port: 7860
traffic-llm
CityFlow-based traffic-control project with intersection-level multi-agent DQN training and district-aware policy variants.
Full model weights and files can be found here: https://huggingface.co/Aditya2162/agentic-traffic
OpenEnv UI
For the deployed OpenEnv web interface:
- Click
Resetbefore usingStep. - Leave
Use Llmunchecked for the fast, stable DQN-only path. - Use
District Actions={}for a valid no-op step payload. - Only enable
Use Llmwhen you explicitly want district-level LLM guidance on top of the DQN executor.
Training
The default local-policy trainer now uses parameter-shared dueling Double DQN with prioritized replay and n-step returns:
python3 -m training.train_local_policy train
That trains against data/generated, uses data/splits, writes checkpoints to artifacts/dqn_shared, enables TensorBoard logging, uses parallel CPU rollout workers by default, shows tqdm progress bars, and now validates plus checkpoints every 40 updates by default.
For a broader but still manageable validation pass:
python3 -m training.train_local_policy train --max-val-cities 3 --val-scenarios-per-city 7
That evaluates 3 validation cities across all 7 scenario types. This gives 21 learned-policy validation episodes per eval, or 63 total episodes if random and fixed baselines are also enabled.
Phase-3-style full training with the same 40-update eval/checkpoint cadence:
python3 -m training.train_local_policy train \
--max-train-cities 70 \
--max-val-cities 3 \
--val-scenarios-per-city 7 \
--policy-arch single_head_with_district_feature \
--reward-variant wait_queue_throughput
Useful ablations:
python3 -m training.train_local_policy train --policy-arch multi_head --reward-variant current
python3 -m training.train_local_policy train --policy-arch single_head --reward-variant current
python3 -m training.train_local_policy train --policy-arch single_head_with_district_feature --reward-variant wait_queue_throughput
For a fast phase-1 overfit run on one fixed world:
python3 -m training.train_local_policy train \
--total-updates 25 \
--train-city-id city_0072 \
--train-scenario-name normal \
--overfit-val-on-train-scenario \
--fast-overfit \
--policy-arch single_head_with_district_feature \
--reward-variant wait_queue_throughput
To create or refresh dataset splits:
python3 -m training.train_local_policy make-splits
To evaluate the best checkpoint:
python3 -m training.train_local_policy evaluate \
--checkpoint artifacts/dqn_shared/best_validation.pt \
--split val
To evaluate a heuristic baseline directly:
python3 -m training.train_local_policy evaluate --baseline queue_greedy --split val
TensorBoard
TensorBoard logs are written to artifacts/dqn_shared/tensorboard by default.
tensorboard --logdir artifacts/dqn_shared/tensorboard
District LLM
The district LLM stack lives under district_llm/. It treats the learned DQN local controller as the low-level executor, derives district-scale SFT labels automatically from DQN rollout windows, and defaults district-model fine-tuning to DQN-derived rows only.
Generate district-LLM data from a learned checkpoint:
python3 -m district_llm.generate_dataset \
--controller rl_checkpoint \
--checkpoint artifacts/dqn_shared/best_validation.pt \
--episodes 100 \
--decision-interval 10 \
--use-checkpoint-env-config \
--output data/district_llm_train.jsonl
Generate from fixed or heuristic baselines:
python3 -m district_llm.generate_dataset --controller fixed --episodes 50 --decision-interval 10 --output data/district_llm_fixed.jsonl
python3 -m district_llm.generate_dataset --controller queue_greedy --episodes 50 --decision-interval 10 --output data/district_llm_heuristic.jsonl
python3 -m district_llm.generate_dataset --teacher-spec fixed --teacher-spec random --episodes 50 --decision-interval 10 --output data/district_llm_multi_teacher.jsonl
Train a first-pass district model with Unsloth/QLoRA:
python3 -m training.train_district_llm \
--dataset data/district_llm_train.jsonl \
--output-dir artifacts/district_llm_qwen \
--model-name Qwen/Qwen2.5-7B-Instruct \
--load-in-4bit \
--lora-rank 16 \
--max-seq-length 1024 \
--max-steps 1000
Run single-sample inference:
python3 -m district_llm.inference \
--model artifacts/district_llm_qwen \
--city-id city_0006 \
--scenario-name accident \
--district-id d_00
Run the OpenEnv-compatible district wrapper on top of the current DQN stack:
uvicorn openenv_app.app:app --reload
Algorithm
- Training algorithm: parameter-shared dueling Double DQN.
- Replay: prioritized replay over per-intersection transitions gathered from full CityFlow worlds.
- Return target: n-step bootstrap target with target-network updates.
- Execution: all controllable intersections act simultaneously every RL decision interval.
- Action space:
0 = hold current phase,1 = switch to next green phase. - Safety:
min_green_timeis enforced in the environment and exposed through action masking.
Policy architecture modes:
multi_head: shared trunk with district-type-specific Q heads.single_head: one shared Q head for all intersections, with district type removed from the observation.single_head_with_district_feature: one shared Q head for all intersections, with district type left in the observation as an explicit feature.
Reward variants:
current: backward-compatible waiting and queue penalty.normalized_wait_queue: normalized queue and waiting reduction reward.wait_queue_throughput: normalized queue/wait reduction plus throughput bonus and imbalance penalty.
Smoke Test
To sanity-check one generated scenario with the real CityFlow environment:
python3 scripts/smoke_test_env.py --city-id city_0001 --scenario-name normal --policy random
Project layout
agents/: heuristic local policies and simple baselines.env/: CityFlow environment, topology parsing, observation building, and reward logic.training/: dataset utilities, replay-based DQN training, evaluation helpers, TensorBoard logging, and CLIs.data/: generated synthetic cities, split files, and dataset generation utilities.scripts/: utility scripts, including the CityFlow smoke test.third_party/: vendored dependencies, including CityFlow source.
Notes
- The generated dataset is assumed to already exist under
data/generated. - District membership comes from
district_map.json. - District types come from
metadata.json. - Runtime training and evaluation require the
cityflowPython module to be installed in the active environment.