Spaces:
Running
Running
| title: Agentic Traffic | |
| emoji: 🏢 | |
| colorFrom: green | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| short_description: Agentic AI to control traffic lights | |
| app_port: 7860 | |
| # traffic-llm | |
| CityFlow-based traffic-control project with intersection-level multi-agent DQN training and district-aware policy variants. | |
| Full model weights and files can be found here: https://huggingface.co/Aditya2162/agentic-traffic | |
| ## OpenEnv UI | |
| For the deployed OpenEnv web interface: | |
| - Click `Reset` before using `Step`. | |
| - Leave `Use Llm` unchecked for the fast, stable DQN-only path. | |
| - Use `District Actions` = `{}` for a valid no-op step payload. | |
| - Only enable `Use Llm` when you explicitly want district-level LLM guidance on top of the DQN executor. | |
| ## Training | |
| The default local-policy trainer now uses parameter-shared dueling Double DQN with prioritized replay and n-step returns: | |
| ```bash | |
| python3 -m training.train_local_policy train | |
| ``` | |
| That trains against `data/generated`, uses `data/splits`, writes checkpoints to `artifacts/dqn_shared`, enables TensorBoard logging, uses parallel CPU rollout workers by default, shows `tqdm` progress bars, and now validates plus checkpoints every 40 updates by default. | |
| For a broader but still manageable validation pass: | |
| ```bash | |
| python3 -m training.train_local_policy train --max-val-cities 3 --val-scenarios-per-city 7 | |
| ``` | |
| That evaluates 3 validation cities across all 7 scenario types. This gives 21 learned-policy validation episodes per eval, or 63 total episodes if random and fixed baselines are also enabled. | |
| Phase-3-style full training with the same 40-update eval/checkpoint cadence: | |
| ```bash | |
| python3 -m training.train_local_policy train \ | |
| --max-train-cities 70 \ | |
| --max-val-cities 3 \ | |
| --val-scenarios-per-city 7 \ | |
| --policy-arch single_head_with_district_feature \ | |
| --reward-variant wait_queue_throughput | |
| ``` | |
| Useful ablations: | |
| ```bash | |
| python3 -m training.train_local_policy train --policy-arch multi_head --reward-variant current | |
| python3 -m training.train_local_policy train --policy-arch single_head --reward-variant current | |
| python3 -m training.train_local_policy train --policy-arch single_head_with_district_feature --reward-variant wait_queue_throughput | |
| ``` | |
| For a fast phase-1 overfit run on one fixed world: | |
| ```bash | |
| python3 -m training.train_local_policy train \ | |
| --total-updates 25 \ | |
| --train-city-id city_0072 \ | |
| --train-scenario-name normal \ | |
| --overfit-val-on-train-scenario \ | |
| --fast-overfit \ | |
| --policy-arch single_head_with_district_feature \ | |
| --reward-variant wait_queue_throughput | |
| ``` | |
| To create or refresh dataset splits: | |
| ```bash | |
| python3 -m training.train_local_policy make-splits | |
| ``` | |
| To evaluate the best checkpoint: | |
| ```bash | |
| python3 -m training.train_local_policy evaluate \ | |
| --checkpoint artifacts/dqn_shared/best_validation.pt \ | |
| --split val | |
| ``` | |
| To evaluate a heuristic baseline directly: | |
| ```bash | |
| python3 -m training.train_local_policy evaluate --baseline queue_greedy --split val | |
| ``` | |
| ## TensorBoard | |
| TensorBoard logs are written to `artifacts/dqn_shared/tensorboard` by default. | |
| ```bash | |
| tensorboard --logdir artifacts/dqn_shared/tensorboard | |
| ``` | |
| ## District LLM | |
| The district LLM stack lives under `district_llm/`. It treats the learned DQN local controller as the low-level executor, derives district-scale SFT labels automatically from DQN rollout windows, and defaults district-model fine-tuning to DQN-derived rows only. | |
| Generate district-LLM data from a learned checkpoint: | |
| ```bash | |
| python3 -m district_llm.generate_dataset \ | |
| --controller rl_checkpoint \ | |
| --checkpoint artifacts/dqn_shared/best_validation.pt \ | |
| --episodes 100 \ | |
| --decision-interval 10 \ | |
| --use-checkpoint-env-config \ | |
| --output data/district_llm_train.jsonl | |
| ``` | |
| Generate from fixed or heuristic baselines: | |
| ```bash | |
| python3 -m district_llm.generate_dataset --controller fixed --episodes 50 --decision-interval 10 --output data/district_llm_fixed.jsonl | |
| python3 -m district_llm.generate_dataset --controller queue_greedy --episodes 50 --decision-interval 10 --output data/district_llm_heuristic.jsonl | |
| python3 -m district_llm.generate_dataset --teacher-spec fixed --teacher-spec random --episodes 50 --decision-interval 10 --output data/district_llm_multi_teacher.jsonl | |
| ``` | |
| Train a first-pass district model with Unsloth/QLoRA: | |
| ```bash | |
| python3 -m training.train_district_llm \ | |
| --dataset data/district_llm_train.jsonl \ | |
| --output-dir artifacts/district_llm_qwen \ | |
| --model-name Qwen/Qwen2.5-7B-Instruct \ | |
| --load-in-4bit \ | |
| --lora-rank 16 \ | |
| --max-seq-length 1024 \ | |
| --max-steps 1000 | |
| ``` | |
| Run single-sample inference: | |
| ```bash | |
| python3 -m district_llm.inference \ | |
| --model artifacts/district_llm_qwen \ | |
| --city-id city_0006 \ | |
| --scenario-name accident \ | |
| --district-id d_00 | |
| ``` | |
| Run the OpenEnv-compatible district wrapper on top of the current DQN stack: | |
| ```bash | |
| uvicorn openenv_app.app:app --reload | |
| ``` | |
| ## Algorithm | |
| - Training algorithm: parameter-shared dueling Double DQN. | |
| - Replay: prioritized replay over per-intersection transitions gathered from full CityFlow worlds. | |
| - Return target: n-step bootstrap target with target-network updates. | |
| - Execution: all controllable intersections act simultaneously every RL decision interval. | |
| - Action space: `0 = hold current phase`, `1 = switch to next green phase`. | |
| - Safety: `min_green_time` is enforced in the environment and exposed through action masking. | |
| Policy architecture modes: | |
| - `multi_head`: shared trunk with district-type-specific Q heads. | |
| - `single_head`: one shared Q head for all intersections, with district type removed from the observation. | |
| - `single_head_with_district_feature`: one shared Q head for all intersections, with district type left in the observation as an explicit feature. | |
| Reward variants: | |
| - `current`: backward-compatible waiting and queue penalty. | |
| - `normalized_wait_queue`: normalized queue and waiting reduction reward. | |
| - `wait_queue_throughput`: normalized queue/wait reduction plus throughput bonus and imbalance penalty. | |
| ## Smoke Test | |
| To sanity-check one generated scenario with the real CityFlow environment: | |
| ```bash | |
| python3 scripts/smoke_test_env.py --city-id city_0001 --scenario-name normal --policy random | |
| ``` | |
| ## Project layout | |
| - `agents/`: heuristic local policies and simple baselines. | |
| - `env/`: CityFlow environment, topology parsing, observation building, and reward logic. | |
| - `training/`: dataset utilities, replay-based DQN training, evaluation helpers, TensorBoard logging, and CLIs. | |
| - `data/`: generated synthetic cities, split files, and dataset generation utilities. | |
| - `scripts/`: utility scripts, including the CityFlow smoke test. | |
| - `third_party/`: vendored dependencies, including CityFlow source. | |
| ## Notes | |
| - The generated dataset is assumed to already exist under `data/generated`. | |
| - District membership comes from `district_map.json`. | |
| - District types come from `metadata.json`. | |
| - Runtime training and evaluation require the `cityflow` Python module to be installed in the active environment. | |