Update README.md

44c905b verified 3 months ago

7.08 kB

	---
	title: Agentic Traffic
	emoji: 🏢
	colorFrom: green
	colorTo: purple
	sdk: docker
	pinned: false
	short_description: Agentic AI to control traffic lights
	app_port: 7860
	language:
	- en
	base_model:
	- meta-llama/Llama-3.1-8B-Instruct
	pipeline_tag: reinforcement-learning
	---

	# traffic-llm

	CityFlow-based traffic-control project with intersection-level multi-agent DQN training and district-aware policy variants.

	A huge thank you to @tokev (Kevin Truong) to helping me with this project.

	## OpenEnv UI

	For the deployed OpenEnv web interface:

	- Click `Reset` before using `Step`.
	- Leave `Use Llm` unchecked for the fast, stable DQN-only path.
	- Use `District Actions` = `{}` for a valid no-op step payload.
	- Only enable `Use Llm` when you explicitly want district-level LLM guidance on top of the DQN executor.

	## Training

	The default local-policy trainer now uses parameter-shared dueling Double DQN with prioritized replay and n-step returns:

	```bash
	python3 -m training.train_local_policy train
	```

	That trains against `data/generated`, uses `data/splits`, writes checkpoints to `artifacts/dqn_shared`, enables TensorBoard logging, uses parallel CPU rollout workers by default, shows `tqdm` progress bars, and now validates plus checkpoints every 40 updates by default.

	For a broader but still manageable validation pass:

	```bash
	python3 -m training.train_local_policy train --max-val-cities 3 --val-scenarios-per-city 7
	```

	That evaluates 3 validation cities across all 7 scenario types. This gives 21 learned-policy validation episodes per eval, or 63 total episodes if random and fixed baselines are also enabled.

	Phase-3-style full training with the same 40-update eval/checkpoint cadence:

	```bash
	python3 -m training.train_local_policy train \
	--max-train-cities 70 \
	--max-val-cities 3 \
	--val-scenarios-per-city 7 \
	--policy-arch single_head_with_district_feature \
	--reward-variant wait_queue_throughput
	```

	Useful ablations:

	```bash
	python3 -m training.train_local_policy train --policy-arch multi_head --reward-variant current
	python3 -m training.train_local_policy train --policy-arch single_head --reward-variant current
	python3 -m training.train_local_policy train --policy-arch single_head_with_district_feature --reward-variant wait_queue_throughput
	```

	For a fast phase-1 overfit run on one fixed world:

	```bash
	python3 -m training.train_local_policy train \
	--total-updates 25 \
	--train-city-id city_0072 \
	--train-scenario-name normal \
	--overfit-val-on-train-scenario \
	--fast-overfit \
	--policy-arch single_head_with_district_feature \
	--reward-variant wait_queue_throughput
	```

	To create or refresh dataset splits:

	```bash
	python3 -m training.train_local_policy make-splits
	```

	To evaluate the best checkpoint:

	```bash
	python3 -m training.train_local_policy evaluate \
	--checkpoint artifacts/dqn_shared/best_validation.pt \
	--split val
	```

	To evaluate a heuristic baseline directly:

	```bash
	python3 -m training.train_local_policy evaluate --baseline queue_greedy --split val
	```

	## TensorBoard

	TensorBoard logs are written to `artifacts/dqn_shared/tensorboard` by default.

	```bash
	tensorboard --logdir artifacts/dqn_shared/tensorboard
	```

	## District LLM

	The district LLM stack lives under `district_llm/`. It treats the learned DQN local controller as the low-level executor, derives district-scale SFT labels automatically from DQN rollout windows, and defaults district-model fine-tuning to DQN-derived rows only.

	Generate district-LLM data from a learned checkpoint:

	```bash
	python3 -m district_llm.generate_dataset \
	--controller rl_checkpoint \
	--checkpoint artifacts/dqn_shared/best_validation.pt \
	--episodes 100 \
	--decision-interval 10 \
	--use-checkpoint-env-config \
	--output data/district_llm_train.jsonl
	```

	Generate from fixed or heuristic baselines:

	```bash
	python3 -m district_llm.generate_dataset --controller fixed --episodes 50 --decision-interval 10 --output data/district_llm_fixed.jsonl
	python3 -m district_llm.generate_dataset --controller queue_greedy --episodes 50 --decision-interval 10 --output data/district_llm_heuristic.jsonl
	python3 -m district_llm.generate_dataset --teacher-spec fixed --teacher-spec random --episodes 50 --decision-interval 10 --output data/district_llm_multi_teacher.jsonl
	```

	Train a first-pass district model with Unsloth/QLoRA:

	```bash
	python3 -m training.train_district_llm \
	--dataset data/district_llm_train.jsonl \
	--output-dir artifacts/district_llm_qwen \
	--model-name Qwen/Qwen2.5-7B-Instruct \
	--load-in-4bit \
	--lora-rank 16 \
	--max-seq-length 1024 \
	--max-steps 1000
	```

	Run single-sample inference:

	```bash
	python3 -m district_llm.inference \
	--model artifacts/district_llm_qwen \
	--city-id city_0006 \
	--scenario-name accident \
	--district-id d_00
	```

	Run the OpenEnv-compatible district wrapper on top of the current DQN stack:

	```bash
	uvicorn openenv_app.app:app --reload
	```

	## Algorithm

	- Training algorithm: parameter-shared dueling Double DQN.
	- Replay: prioritized replay over per-intersection transitions gathered from full CityFlow worlds.
	- Return target: n-step bootstrap target with target-network updates.
	- Execution: all controllable intersections act simultaneously every RL decision interval.
	- Action space: `0 = hold current phase`, `1 = switch to next green phase`.
	- Safety: `min_green_time` is enforced in the environment and exposed through action masking.

	Policy architecture modes:

	- `multi_head`: shared trunk with district-type-specific Q heads.
	- `single_head`: one shared Q head for all intersections, with district type removed from the observation.
	- `single_head_with_district_feature`: one shared Q head for all intersections, with district type left in the observation as an explicit feature.

	Reward variants:

	- `current`: backward-compatible waiting and queue penalty.
	- `normalized_wait_queue`: normalized queue and waiting reduction reward.
	- `wait_queue_throughput`: normalized queue/wait reduction plus throughput bonus and imbalance penalty.

	## Smoke Test

	To sanity-check one generated scenario with the real CityFlow environment:

	```bash
	python3 scripts/smoke_test_env.py --city-id city_0001 --scenario-name normal --policy random
	```

	## Project layout

	- `agents/`: heuristic local policies and simple baselines.
	- `env/`: CityFlow environment, topology parsing, observation building, and reward logic.
	- `training/`: dataset utilities, replay-based DQN training, evaluation helpers, TensorBoard logging, and CLIs.
	- `data/`: generated synthetic cities, split files, and dataset generation utilities.
	- `scripts/`: utility scripts, including the CityFlow smoke test.
	- `third_party/`: vendored dependencies, including CityFlow source.

	## Notes

	- The generated dataset is assumed to already exist under `data/generated`.
	- District membership comes from `district_map.json`.
	- District types come from `metadata.json`.
	- Runtime training and evaluation require the `cityflow` Python module to be installed in the active environment.