---
title: Agentic Traffic
emoji: 🏢
colorFrom: green
colorTo: purple
sdk: docker
pinned: false
short_description: Agentic AI to control traffic lights
app_port: 7860
---

# traffic-llm

CityFlow-based traffic-control project with intersection-level multi-agent DQN training and district-aware policy variants.

Full model weights and files can be found here: https://huggingface.co/Aditya2162/agentic-traffic

## OpenEnv UI

For the deployed OpenEnv web interface:

- Click `Reset` before using `Step`.
- Leave `Use Llm` unchecked for the fast, stable DQN-only path.
- Use `District Actions` = `{}` for a valid no-op step payload.
- Only enable `Use Llm` when you explicitly want district-level LLM guidance on top of the DQN executor.

## Training

The default local-policy trainer now uses parameter-shared dueling Double DQN with prioritized replay and n-step returns:

```bash
python3 -m training.train_local_policy train
```

That trains against `data/generated`, uses `data/splits`, writes checkpoints to `artifacts/dqn_shared`, enables TensorBoard logging, uses parallel CPU rollout workers by default, shows `tqdm` progress bars, and now validates plus checkpoints every 40 updates by default.

For a broader but still manageable validation pass:

```bash
python3 -m training.train_local_policy train --max-val-cities 3 --val-scenarios-per-city 7
```

That evaluates 3 validation cities across all 7 scenario types. This gives 21 learned-policy validation episodes per eval, or 63 total episodes if random and fixed baselines are also enabled.

Phase-3-style full training with the same 40-update eval/checkpoint cadence:

```bash
python3 -m training.train_local_policy train \
  --max-train-cities 70 \
  --max-val-cities 3 \
  --val-scenarios-per-city 7 \
  --policy-arch single_head_with_district_feature \
  --reward-variant wait_queue_throughput
```

Useful ablations:

```bash
python3 -m training.train_local_policy train --policy-arch multi_head --reward-variant current
python3 -m training.train_local_policy train --policy-arch single_head --reward-variant current
python3 -m training.train_local_policy train --policy-arch single_head_with_district_feature --reward-variant wait_queue_throughput
```

For a fast phase-1 overfit run on one fixed world:

```bash
python3 -m training.train_local_policy train \
  --total-updates 25 \
  --train-city-id city_0072 \
  --train-scenario-name normal \
  --overfit-val-on-train-scenario \
  --fast-overfit \
  --policy-arch single_head_with_district_feature \
  --reward-variant wait_queue_throughput
```

To create or refresh dataset splits:

```bash
python3 -m training.train_local_policy make-splits
```

To evaluate the best checkpoint:

```bash
python3 -m training.train_local_policy evaluate \
  --checkpoint artifacts/dqn_shared/best_validation.pt \
  --split val
```

To evaluate a heuristic baseline directly:

```bash
python3 -m training.train_local_policy evaluate --baseline queue_greedy --split val
```

## TensorBoard

TensorBoard logs are written to `artifacts/dqn_shared/tensorboard` by default.

```bash
tensorboard --logdir artifacts/dqn_shared/tensorboard
```

## District LLM

The district LLM stack lives under `district_llm/`. It treats the learned DQN local controller as the low-level executor, derives district-scale SFT labels automatically from DQN rollout windows, and defaults district-model fine-tuning to DQN-derived rows only.

Generate district-LLM data from a learned checkpoint:

```bash
python3 -m district_llm.generate_dataset \
  --controller rl_checkpoint \
  --checkpoint artifacts/dqn_shared/best_validation.pt \
  --episodes 100 \
  --decision-interval 10 \
  --use-checkpoint-env-config \
  --output data/district_llm_train.jsonl
```

Generate from fixed or heuristic baselines:

```bash
python3 -m district_llm.generate_dataset --controller fixed --episodes 50 --decision-interval 10 --output data/district_llm_fixed.jsonl
python3 -m district_llm.generate_dataset --controller queue_greedy --episodes 50 --decision-interval 10 --output data/district_llm_heuristic.jsonl
python3 -m district_llm.generate_dataset --teacher-spec fixed --teacher-spec random --episodes 50 --decision-interval 10 --output data/district_llm_multi_teacher.jsonl
```

Train a first-pass district model with Unsloth/QLoRA:

```bash
python3 -m training.train_district_llm \
  --dataset data/district_llm_train.jsonl \
  --output-dir artifacts/district_llm_qwen \
  --model-name Qwen/Qwen2.5-7B-Instruct \
  --load-in-4bit \
  --lora-rank 16 \
  --max-seq-length 1024 \
  --max-steps 1000
```

Run single-sample inference:

```bash
python3 -m district_llm.inference \
  --model artifacts/district_llm_qwen \
  --city-id city_0006 \
  --scenario-name accident \
  --district-id d_00
```

Run the OpenEnv-compatible district wrapper on top of the current DQN stack:

```bash
uvicorn openenv_app.app:app --reload
```

## Algorithm

- Training algorithm: parameter-shared dueling Double DQN.
- Replay: prioritized replay over per-intersection transitions gathered from full CityFlow worlds.
- Return target: n-step bootstrap target with target-network updates.
- Execution: all controllable intersections act simultaneously every RL decision interval.
- Action space: `0 = hold current phase`, `1 = switch to next green phase`.
- Safety: `min_green_time` is enforced in the environment and exposed through action masking.

Policy architecture modes:

- `multi_head`: shared trunk with district-type-specific Q heads.
- `single_head`: one shared Q head for all intersections, with district type removed from the observation.
- `single_head_with_district_feature`: one shared Q head for all intersections, with district type left in the observation as an explicit feature.

Reward variants:

- `current`: backward-compatible waiting and queue penalty.
- `normalized_wait_queue`: normalized queue and waiting reduction reward.
- `wait_queue_throughput`: normalized queue/wait reduction plus throughput bonus and imbalance penalty.

## Smoke Test

To sanity-check one generated scenario with the real CityFlow environment:

```bash
python3 scripts/smoke_test_env.py --city-id city_0001 --scenario-name normal --policy random
```

## Project layout

- `agents/`: heuristic local policies and simple baselines.
- `env/`: CityFlow environment, topology parsing, observation building, and reward logic.
- `training/`: dataset utilities, replay-based DQN training, evaluation helpers, TensorBoard logging, and CLIs.
- `data/`: generated synthetic cities, split files, and dataset generation utilities.
- `scripts/`: utility scripts, including the CityFlow smoke test.
- `third_party/`: vendored dependencies, including CityFlow source.

## Notes

- The generated dataset is assumed to already exist under `data/generated`.
- District membership comes from `district_map.json`.
- District types come from `metadata.json`.
- Runtime training and evaluation require the `cityflow` Python module to be installed in the active environment.