traffic-visualizer / README.md
tokev's picture
Add files using upload-large-folder tool
5893134 verified
metadata
title: Agentic Traffic
emoji: 🏢
colorFrom: green
colorTo: purple
sdk: docker
pinned: false
short_description: Agentic AI to control traffic lights
app_port: 7860

traffic-llm

CityFlow-based traffic-control project with intersection-level multi-agent DQN training and district-aware policy variants.

Full model weights and files can be found here: https://huggingface.co/Aditya2162/agentic-traffic

OpenEnv UI

For the deployed OpenEnv web interface:

  • Click Reset before using Step.
  • Leave Use Llm unchecked for the fast, stable DQN-only path.
  • Use District Actions = {} for a valid no-op step payload.
  • Only enable Use Llm when you explicitly want district-level LLM guidance on top of the DQN executor.

Training

The default local-policy trainer now uses parameter-shared dueling Double DQN with prioritized replay and n-step returns:

python3 -m training.train_local_policy train

That trains against data/generated, uses data/splits, writes checkpoints to artifacts/dqn_shared, enables TensorBoard logging, uses parallel CPU rollout workers by default, shows tqdm progress bars, and now validates plus checkpoints every 40 updates by default.

For a broader but still manageable validation pass:

python3 -m training.train_local_policy train --max-val-cities 3 --val-scenarios-per-city 7

That evaluates 3 validation cities across all 7 scenario types. This gives 21 learned-policy validation episodes per eval, or 63 total episodes if random and fixed baselines are also enabled.

Phase-3-style full training with the same 40-update eval/checkpoint cadence:

python3 -m training.train_local_policy train \
  --max-train-cities 70 \
  --max-val-cities 3 \
  --val-scenarios-per-city 7 \
  --policy-arch single_head_with_district_feature \
  --reward-variant wait_queue_throughput

Useful ablations:

python3 -m training.train_local_policy train --policy-arch multi_head --reward-variant current
python3 -m training.train_local_policy train --policy-arch single_head --reward-variant current
python3 -m training.train_local_policy train --policy-arch single_head_with_district_feature --reward-variant wait_queue_throughput

For a fast phase-1 overfit run on one fixed world:

python3 -m training.train_local_policy train \
  --total-updates 25 \
  --train-city-id city_0072 \
  --train-scenario-name normal \
  --overfit-val-on-train-scenario \
  --fast-overfit \
  --policy-arch single_head_with_district_feature \
  --reward-variant wait_queue_throughput

To create or refresh dataset splits:

python3 -m training.train_local_policy make-splits

To evaluate the best checkpoint:

python3 -m training.train_local_policy evaluate \
  --checkpoint artifacts/dqn_shared/best_validation.pt \
  --split val

To evaluate a heuristic baseline directly:

python3 -m training.train_local_policy evaluate --baseline queue_greedy --split val

TensorBoard

TensorBoard logs are written to artifacts/dqn_shared/tensorboard by default.

tensorboard --logdir artifacts/dqn_shared/tensorboard

District LLM

The district LLM stack lives under district_llm/. It treats the learned DQN local controller as the low-level executor, derives district-scale SFT labels automatically from DQN rollout windows, and defaults district-model fine-tuning to DQN-derived rows only.

Generate district-LLM data from a learned checkpoint:

python3 -m district_llm.generate_dataset \
  --controller rl_checkpoint \
  --checkpoint artifacts/dqn_shared/best_validation.pt \
  --episodes 100 \
  --decision-interval 10 \
  --use-checkpoint-env-config \
  --output data/district_llm_train.jsonl

Generate from fixed or heuristic baselines:

python3 -m district_llm.generate_dataset --controller fixed --episodes 50 --decision-interval 10 --output data/district_llm_fixed.jsonl
python3 -m district_llm.generate_dataset --controller queue_greedy --episodes 50 --decision-interval 10 --output data/district_llm_heuristic.jsonl
python3 -m district_llm.generate_dataset --teacher-spec fixed --teacher-spec random --episodes 50 --decision-interval 10 --output data/district_llm_multi_teacher.jsonl

Train a first-pass district model with Unsloth/QLoRA:

python3 -m training.train_district_llm \
  --dataset data/district_llm_train.jsonl \
  --output-dir artifacts/district_llm_qwen \
  --model-name Qwen/Qwen2.5-7B-Instruct \
  --load-in-4bit \
  --lora-rank 16 \
  --max-seq-length 1024 \
  --max-steps 1000

Run single-sample inference:

python3 -m district_llm.inference \
  --model artifacts/district_llm_qwen \
  --city-id city_0006 \
  --scenario-name accident \
  --district-id d_00

Run the OpenEnv-compatible district wrapper on top of the current DQN stack:

uvicorn openenv_app.app:app --reload

Algorithm

  • Training algorithm: parameter-shared dueling Double DQN.
  • Replay: prioritized replay over per-intersection transitions gathered from full CityFlow worlds.
  • Return target: n-step bootstrap target with target-network updates.
  • Execution: all controllable intersections act simultaneously every RL decision interval.
  • Action space: 0 = hold current phase, 1 = switch to next green phase.
  • Safety: min_green_time is enforced in the environment and exposed through action masking.

Policy architecture modes:

  • multi_head: shared trunk with district-type-specific Q heads.
  • single_head: one shared Q head for all intersections, with district type removed from the observation.
  • single_head_with_district_feature: one shared Q head for all intersections, with district type left in the observation as an explicit feature.

Reward variants:

  • current: backward-compatible waiting and queue penalty.
  • normalized_wait_queue: normalized queue and waiting reduction reward.
  • wait_queue_throughput: normalized queue/wait reduction plus throughput bonus and imbalance penalty.

Smoke Test

To sanity-check one generated scenario with the real CityFlow environment:

python3 scripts/smoke_test_env.py --city-id city_0001 --scenario-name normal --policy random

Project layout

  • agents/: heuristic local policies and simple baselines.
  • env/: CityFlow environment, topology parsing, observation building, and reward logic.
  • training/: dataset utilities, replay-based DQN training, evaluation helpers, TensorBoard logging, and CLIs.
  • data/: generated synthetic cities, split files, and dataset generation utilities.
  • scripts/: utility scripts, including the CityFlow smoke test.
  • third_party/: vendored dependencies, including CityFlow source.

Notes

  • The generated dataset is assumed to already exist under data/generated.
  • District membership comes from district_map.json.
  • District types come from metadata.json.
  • Runtime training and evaluation require the cityflow Python module to be installed in the active environment.