sumo_rl_env / README.md
burtenshaw's picture
burtenshaw HF Staff
Upload folder using huggingface_hub
429558a verified
---
title: sumo_rl_env Environment
sdk: docker
app_port: 8000
base_path: /web
tags:
- openenv
- openenv-0.2.3
---
# sumo_rl_env Environment
Space URL: `https://huggingface.co/spaces/openenv/sumo_rl_env`
OpenEnv pinned ref: `0.2.3`
# SUMO-RL Environment
Integration of traffic signal control with the OpenEnv framework via SUMO (Simulation of Urban MObility) and SUMO-RL.
## Overview
This environment enables reinforcement learning for **traffic signal control** using SUMO, a microscopic traffic simulation package. Train RL agents to optimize traffic light timing and minimize vehicle delays.
**Key Features**:
- **Realistic traffic simulation** via SUMO
- **Single-agent mode** for single intersection control
- **Configurable rewards** (waiting time, queue, pressure, speed)
- **Multiple networks** supported (custom .net.xml and .rou.xml files)
- **Docker-ready** with pre-bundled example network
## Quick Start
### Using Docker (Recommended)
```python
from envs.sumo_rl_env import SumoRLEnv, SumoAction
# Automatically starts container
env = SumoRLEnv.from_docker_image("sumo-rl-env:latest")
# Reset environment
result = env.reset()
print(f"Observation shape: {result.observation.observation_shape}")
print(f"Available actions: {result.observation.action_mask}")
# Take action (select next green phase)
result = env.step(SumoAction(phase_id=1))
print(f"Reward: {result.reward}, Done: {result.done}")
# Get state
state = env.state()
print(f"Simulation time: {state.sim_time}")
print(f"Total vehicles: {state.total_vehicles}")
print(f"Mean waiting time: {state.mean_waiting_time}")
# Cleanup
env.close()
```
### Building the Docker Image
```bash
cd OpenEnv
# Build base image first (if not already built)
docker build -t envtorch-base:latest -f src/openenv/core/containers/images/Dockerfile .
# Build SUMO-RL environment
docker build -f envs/sumo_rl_env/server/Dockerfile -t sumo-rl-env:latest .
```
### Running with Different Configurations
```bash
# Default: single-intersection
docker run -p 8000:8000 sumo-rl-env:latest
# Longer simulation
docker run -p 8000:8000 \
-e SUMO_NUM_SECONDS=50000 \
sumo-rl-env:latest
# Different reward function
docker run -p 8000:8000 \
-e SUMO_REWARD_FN=queue \
sumo-rl-env:latest
# Custom seed for reproducibility
docker run -p 8000:8000 \
-e SUMO_SEED=123 \
sumo-rl-env:latest
```
## Observation
The observation is a vector containing:
- **Phase one-hot**: Current active green phase (one-hot encoded)
- **Min green flag**: Binary indicator if minimum green time has passed
- **Lane densities**: Number of vehicles / lane capacity for each incoming lane
- **Lane queues**: Number of queued vehicles / lane capacity for each incoming lane
Observation size varies by network topology (depends on number of phases and lanes).
**Default (single-intersection)**:
- 4 green phases
- 8 incoming lanes
- Observation size: ~21 elements
## Action Space
The action space is discrete and represents selecting the next green phase to activate.
- **Action type**: Discrete
- **Action range**: `[0, num_green_phases - 1]`
- **Default (single-intersection)**: 4 actions (one per green phase)
When a phase change is requested, SUMO automatically inserts a yellow phase before switching.
## Rewards
Default reward function is **change in cumulative waiting time**:
```
reward = -(total_waiting_time_now - total_waiting_time_previous)
```
Positive rewards indicate waiting time decreased (good).
### Available Reward Functions
Set via `SUMO_REWARD_FN` environment variable:
- **`diff-waiting-time`** (default): Change in cumulative waiting time
- **`average-speed`**: Average speed of all vehicles
- **`queue`**: Negative total queue length
- **`pressure`**: Pressure metric (incoming - outgoing vehicles)
## Configuration
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `SUMO_NET_FILE` | `/app/nets/single-intersection.net.xml` | Network topology file |
| `SUMO_ROUTE_FILE` | `/app/nets/single-intersection.rou.xml` | Vehicle routes file |
| `SUMO_NUM_SECONDS` | `20000` | Simulation duration (seconds) |
| `SUMO_DELTA_TIME` | `5` | Seconds between agent actions |
| `SUMO_YELLOW_TIME` | `2` | Yellow phase duration (seconds) |
| `SUMO_MIN_GREEN` | `5` | Minimum green time (seconds) |
| `SUMO_MAX_GREEN` | `50` | Maximum green time (seconds) |
| `SUMO_REWARD_FN` | `diff-waiting-time` | Reward function name |
| `SUMO_SEED` | `42` | Random seed (use for reproducibility) |
### Using Custom Networks
To use your own SUMO network:
```python
from envs.sumo_rl_env import SumoRLEnv
env = SumoRLEnv.from_docker_image(
"sumo-rl-env:latest",
volumes={
"/path/to/your/nets": {"bind": "/nets", "mode": "ro"}
},
environment={
"SUMO_NET_FILE": "/nets/my-network.net.xml",
"SUMO_ROUTE_FILE": "/nets/my-routes.rou.xml",
}
)
```
Your network directory should contain:
- `.net.xml` - Network topology (roads, junctions, traffic lights)
- `.rou.xml` - Vehicle routes (trip definitions, flow rates)
## API Reference
### SumoAction
```python
@dataclass
class SumoAction(Action):
phase_id: int # Green phase to activate (0 to num_phases-1)
ts_id: str = "0" # Traffic signal ID (for multi-agent)
```
### SumoObservation
```python
@dataclass
class SumoObservation(Observation):
observation: List[float] # Observation vector
observation_shape: List[int] # Shape for reshaping
action_mask: List[int] # Valid action indices
sim_time: float # Current simulation time
done: bool # Episode finished
reward: Optional[float] # Reward from last action
metadata: Dict # System metrics
```
### SumoState
```python
@dataclass
class SumoState(State):
episode_id: str # Unique episode ID
step_count: int # Steps taken
net_file: str # Network file path
route_file: str # Route file path
sim_time: float # Current simulation time
total_vehicles: int # Total vehicles in simulation
total_waiting_time: float # Cumulative waiting time
mean_waiting_time: float # Mean waiting time
mean_speed: float # Mean vehicle speed
# ... configuration parameters
```
## Example Training Loop
```python
from envs.sumo_rl_env import SumoRLEnv, SumoAction
import numpy as np
# Start environment
env = SumoRLEnv.from_docker_image("sumo-rl-env:latest")
# Training loop
for episode in range(10):
result = env.reset()
episode_reward = 0
steps = 0
while not result.done and steps < 1000:
# Random policy (replace with your RL agent)
action_id = np.random.choice(result.observation.action_mask)
# Take action
result = env.step(SumoAction(phase_id=int(action_id)))
episode_reward += result.reward or 0
steps += 1
# Print progress every 100 steps
if steps % 100 == 0:
state = env.state()
print(f"Step {steps}: "
f"reward={result.reward:.2f}, "
f"vehicles={state.total_vehicles}, "
f"waiting={state.mean_waiting_time:.2f}")
print(f"Episode {episode}: total_reward={episode_reward:.2f}, steps={steps}")
env.close()
```
## Performance Notes
### Simulation Speed
- **Reset time**: 1-5 seconds (starts new SUMO simulation)
- **Step time**: ~50-200ms per step (depends on network size)
- **Episode duration**: Minutes (20,000 sim seconds with delta_time=5 β†’ ~4,000 steps)
### Optimization
For faster simulation:
1. Reduce `SUMO_NUM_SECONDS` for shorter episodes
2. Increase `SUMO_DELTA_TIME` for fewer decisions
3. Use simpler networks with fewer vehicles
## Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Client: SumoRLEnv β”‚
β”‚ .step(phase_id=1) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ HTTP
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ FastAPI Server (Docker) β”‚
β”‚ SumoEnvironment β”‚
β”‚ β”œβ”€ Wraps sumo_rl β”‚
β”‚ β”œβ”€ Single-agent mode β”‚
β”‚ └─ No GUI β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ SUMO Simulator β”‚
β”‚ - Reads .net.xml (network) β”‚
β”‚ - Reads .rou.xml (routes) β”‚
β”‚ - Simulates traffic flow β”‚
β”‚ - Provides observations β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Bundled Network
The default `single-intersection` network is a simple 4-way intersection with:
- **4 incoming roads** (North, South, East, West)
- **4 green phases** (NS straight, NS left, EW straight, EW left)
- **Vehicle flow**: Continuous stream with varying rates
## Limitations
- **No GUI in Docker**: SUMO GUI requires X server (not available in containers)
- **Single-agent only**: Multi-agent (multiple intersections) coming in future version
- **Fixed network per container**: Each container uses one network topology
- **Memory usage**: ~500MB for small networks, 2-4GB for large city networks
## Troubleshooting
### Container won't start
```bash
# Check logs
docker logs <container-id>
# Verify network files exist
docker run sumo-rl-env:latest ls -la /app/nets/
```
### "SUMO_HOME not set" error
This should be automatic in Docker. If running locally:
```bash
export SUMO_HOME=/usr/share/sumo
```
### Slow performance
- Reduce simulation duration: `SUMO_NUM_SECONDS=5000`
- Increase action interval: `SUMO_DELTA_TIME=10`
- Use smaller networks with fewer vehicles
## References
- [SUMO Documentation](https://sumo.dlr.de/docs/)
- [SUMO-RL GitHub](https://github.com/LucasAlegre/sumo-rl)
- [SUMO-RL Paper](https://peerj.com/articles/cs-575/)
- [RESCO Benchmarks](https://github.com/jault/RESCO)
## Citation
If you use SUMO-RL in your research, please cite:
```bibtex
@misc{sumorl,
author = {Lucas N. Alegre},
title = {{SUMO-RL}},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/LucasAlegre/sumo-rl}},
}
```
## License
This integration is licensed under the BSD-style license. SUMO-RL and SUMO have their own licenses.