Spaces:
Running
Running
| title: sumo_rl_env Environment | |
| sdk: docker | |
| app_port: 8000 | |
| base_path: /web | |
| tags: | |
| - openenv | |
| - openenv-0.2.3 | |
| # sumo_rl_env Environment | |
| Space URL: `https://huggingface.co/spaces/openenv/sumo_rl_env` | |
| OpenEnv pinned ref: `0.2.3` | |
| # SUMO-RL Environment | |
| Integration of traffic signal control with the OpenEnv framework via SUMO (Simulation of Urban MObility) and SUMO-RL. | |
| ## Overview | |
| This environment enables reinforcement learning for **traffic signal control** using SUMO, a microscopic traffic simulation package. Train RL agents to optimize traffic light timing and minimize vehicle delays. | |
| **Key Features**: | |
| - **Realistic traffic simulation** via SUMO | |
| - **Single-agent mode** for single intersection control | |
| - **Configurable rewards** (waiting time, queue, pressure, speed) | |
| - **Multiple networks** supported (custom .net.xml and .rou.xml files) | |
| - **Docker-ready** with pre-bundled example network | |
| ## Quick Start | |
| ### Using Docker (Recommended) | |
| ```python | |
| from envs.sumo_rl_env import SumoRLEnv, SumoAction | |
| # Automatically starts container | |
| env = SumoRLEnv.from_docker_image("sumo-rl-env:latest") | |
| # Reset environment | |
| result = env.reset() | |
| print(f"Observation shape: {result.observation.observation_shape}") | |
| print(f"Available actions: {result.observation.action_mask}") | |
| # Take action (select next green phase) | |
| result = env.step(SumoAction(phase_id=1)) | |
| print(f"Reward: {result.reward}, Done: {result.done}") | |
| # Get state | |
| state = env.state() | |
| print(f"Simulation time: {state.sim_time}") | |
| print(f"Total vehicles: {state.total_vehicles}") | |
| print(f"Mean waiting time: {state.mean_waiting_time}") | |
| # Cleanup | |
| env.close() | |
| ``` | |
| ### Building the Docker Image | |
| ```bash | |
| cd OpenEnv | |
| # Build base image first (if not already built) | |
| docker build -t envtorch-base:latest -f src/openenv/core/containers/images/Dockerfile . | |
| # Build SUMO-RL environment | |
| docker build -f envs/sumo_rl_env/server/Dockerfile -t sumo-rl-env:latest . | |
| ``` | |
| ### Running with Different Configurations | |
| ```bash | |
| # Default: single-intersection | |
| docker run -p 8000:8000 sumo-rl-env:latest | |
| # Longer simulation | |
| docker run -p 8000:8000 \ | |
| -e SUMO_NUM_SECONDS=50000 \ | |
| sumo-rl-env:latest | |
| # Different reward function | |
| docker run -p 8000:8000 \ | |
| -e SUMO_REWARD_FN=queue \ | |
| sumo-rl-env:latest | |
| # Custom seed for reproducibility | |
| docker run -p 8000:8000 \ | |
| -e SUMO_SEED=123 \ | |
| sumo-rl-env:latest | |
| ``` | |
| ## Observation | |
| The observation is a vector containing: | |
| - **Phase one-hot**: Current active green phase (one-hot encoded) | |
| - **Min green flag**: Binary indicator if minimum green time has passed | |
| - **Lane densities**: Number of vehicles / lane capacity for each incoming lane | |
| - **Lane queues**: Number of queued vehicles / lane capacity for each incoming lane | |
| Observation size varies by network topology (depends on number of phases and lanes). | |
| **Default (single-intersection)**: | |
| - 4 green phases | |
| - 8 incoming lanes | |
| - Observation size: ~21 elements | |
| ## Action Space | |
| The action space is discrete and represents selecting the next green phase to activate. | |
| - **Action type**: Discrete | |
| - **Action range**: `[0, num_green_phases - 1]` | |
| - **Default (single-intersection)**: 4 actions (one per green phase) | |
| When a phase change is requested, SUMO automatically inserts a yellow phase before switching. | |
| ## Rewards | |
| Default reward function is **change in cumulative waiting time**: | |
| ``` | |
| reward = -(total_waiting_time_now - total_waiting_time_previous) | |
| ``` | |
| Positive rewards indicate waiting time decreased (good). | |
| ### Available Reward Functions | |
| Set via `SUMO_REWARD_FN` environment variable: | |
| - **`diff-waiting-time`** (default): Change in cumulative waiting time | |
| - **`average-speed`**: Average speed of all vehicles | |
| - **`queue`**: Negative total queue length | |
| - **`pressure`**: Pressure metric (incoming - outgoing vehicles) | |
| ## Configuration | |
| ### Environment Variables | |
| | Variable | Default | Description | | |
| |----------|---------|-------------| | |
| | `SUMO_NET_FILE` | `/app/nets/single-intersection.net.xml` | Network topology file | | |
| | `SUMO_ROUTE_FILE` | `/app/nets/single-intersection.rou.xml` | Vehicle routes file | | |
| | `SUMO_NUM_SECONDS` | `20000` | Simulation duration (seconds) | | |
| | `SUMO_DELTA_TIME` | `5` | Seconds between agent actions | | |
| | `SUMO_YELLOW_TIME` | `2` | Yellow phase duration (seconds) | | |
| | `SUMO_MIN_GREEN` | `5` | Minimum green time (seconds) | | |
| | `SUMO_MAX_GREEN` | `50` | Maximum green time (seconds) | | |
| | `SUMO_REWARD_FN` | `diff-waiting-time` | Reward function name | | |
| | `SUMO_SEED` | `42` | Random seed (use for reproducibility) | | |
| ### Using Custom Networks | |
| To use your own SUMO network: | |
| ```python | |
| from envs.sumo_rl_env import SumoRLEnv | |
| env = SumoRLEnv.from_docker_image( | |
| "sumo-rl-env:latest", | |
| volumes={ | |
| "/path/to/your/nets": {"bind": "/nets", "mode": "ro"} | |
| }, | |
| environment={ | |
| "SUMO_NET_FILE": "/nets/my-network.net.xml", | |
| "SUMO_ROUTE_FILE": "/nets/my-routes.rou.xml", | |
| } | |
| ) | |
| ``` | |
| Your network directory should contain: | |
| - `.net.xml` - Network topology (roads, junctions, traffic lights) | |
| - `.rou.xml` - Vehicle routes (trip definitions, flow rates) | |
| ## API Reference | |
| ### SumoAction | |
| ```python | |
| @dataclass | |
| class SumoAction(Action): | |
| phase_id: int # Green phase to activate (0 to num_phases-1) | |
| ts_id: str = "0" # Traffic signal ID (for multi-agent) | |
| ``` | |
| ### SumoObservation | |
| ```python | |
| @dataclass | |
| class SumoObservation(Observation): | |
| observation: List[float] # Observation vector | |
| observation_shape: List[int] # Shape for reshaping | |
| action_mask: List[int] # Valid action indices | |
| sim_time: float # Current simulation time | |
| done: bool # Episode finished | |
| reward: Optional[float] # Reward from last action | |
| metadata: Dict # System metrics | |
| ``` | |
| ### SumoState | |
| ```python | |
| @dataclass | |
| class SumoState(State): | |
| episode_id: str # Unique episode ID | |
| step_count: int # Steps taken | |
| net_file: str # Network file path | |
| route_file: str # Route file path | |
| sim_time: float # Current simulation time | |
| total_vehicles: int # Total vehicles in simulation | |
| total_waiting_time: float # Cumulative waiting time | |
| mean_waiting_time: float # Mean waiting time | |
| mean_speed: float # Mean vehicle speed | |
| # ... configuration parameters | |
| ``` | |
| ## Example Training Loop | |
| ```python | |
| from envs.sumo_rl_env import SumoRLEnv, SumoAction | |
| import numpy as np | |
| # Start environment | |
| env = SumoRLEnv.from_docker_image("sumo-rl-env:latest") | |
| # Training loop | |
| for episode in range(10): | |
| result = env.reset() | |
| episode_reward = 0 | |
| steps = 0 | |
| while not result.done and steps < 1000: | |
| # Random policy (replace with your RL agent) | |
| action_id = np.random.choice(result.observation.action_mask) | |
| # Take action | |
| result = env.step(SumoAction(phase_id=int(action_id))) | |
| episode_reward += result.reward or 0 | |
| steps += 1 | |
| # Print progress every 100 steps | |
| if steps % 100 == 0: | |
| state = env.state() | |
| print(f"Step {steps}: " | |
| f"reward={result.reward:.2f}, " | |
| f"vehicles={state.total_vehicles}, " | |
| f"waiting={state.mean_waiting_time:.2f}") | |
| print(f"Episode {episode}: total_reward={episode_reward:.2f}, steps={steps}") | |
| env.close() | |
| ``` | |
| ## Performance Notes | |
| ### Simulation Speed | |
| - **Reset time**: 1-5 seconds (starts new SUMO simulation) | |
| - **Step time**: ~50-200ms per step (depends on network size) | |
| - **Episode duration**: Minutes (20,000 sim seconds with delta_time=5 β ~4,000 steps) | |
| ### Optimization | |
| For faster simulation: | |
| 1. Reduce `SUMO_NUM_SECONDS` for shorter episodes | |
| 2. Increase `SUMO_DELTA_TIME` for fewer decisions | |
| 3. Use simpler networks with fewer vehicles | |
| ## Architecture | |
| ``` | |
| βββββββββββββββββββββββββββββββββββ | |
| β Client: SumoRLEnv β | |
| β .step(phase_id=1) β | |
| ββββββββββββββββ¬βββββββββββββββββββ | |
| β HTTP | |
| ββββββββββββββββΌβββββββββββββββββββ | |
| β FastAPI Server (Docker) β | |
| β SumoEnvironment β | |
| β ββ Wraps sumo_rl β | |
| β ββ Single-agent mode β | |
| β ββ No GUI β | |
| ββββββββββββββββ¬βββββββββββββββββββ | |
| β | |
| ββββββββββββββββΌβββββββββββββββββββ | |
| β SUMO Simulator β | |
| β - Reads .net.xml (network) β | |
| β - Reads .rou.xml (routes) β | |
| β - Simulates traffic flow β | |
| β - Provides observations β | |
| βββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ## Bundled Network | |
| The default `single-intersection` network is a simple 4-way intersection with: | |
| - **4 incoming roads** (North, South, East, West) | |
| - **4 green phases** (NS straight, NS left, EW straight, EW left) | |
| - **Vehicle flow**: Continuous stream with varying rates | |
| ## Limitations | |
| - **No GUI in Docker**: SUMO GUI requires X server (not available in containers) | |
| - **Single-agent only**: Multi-agent (multiple intersections) coming in future version | |
| - **Fixed network per container**: Each container uses one network topology | |
| - **Memory usage**: ~500MB for small networks, 2-4GB for large city networks | |
| ## Troubleshooting | |
| ### Container won't start | |
| ```bash | |
| # Check logs | |
| docker logs <container-id> | |
| # Verify network files exist | |
| docker run sumo-rl-env:latest ls -la /app/nets/ | |
| ``` | |
| ### "SUMO_HOME not set" error | |
| This should be automatic in Docker. If running locally: | |
| ```bash | |
| export SUMO_HOME=/usr/share/sumo | |
| ``` | |
| ### Slow performance | |
| - Reduce simulation duration: `SUMO_NUM_SECONDS=5000` | |
| - Increase action interval: `SUMO_DELTA_TIME=10` | |
| - Use smaller networks with fewer vehicles | |
| ## References | |
| - [SUMO Documentation](https://sumo.dlr.de/docs/) | |
| - [SUMO-RL GitHub](https://github.com/LucasAlegre/sumo-rl) | |
| - [SUMO-RL Paper](https://peerj.com/articles/cs-575/) | |
| - [RESCO Benchmarks](https://github.com/jault/RESCO) | |
| ## Citation | |
| If you use SUMO-RL in your research, please cite: | |
| ```bibtex | |
| @misc{sumorl, | |
| author = {Lucas N. Alegre}, | |
| title = {{SUMO-RL}}, | |
| year = {2019}, | |
| publisher = {GitHub}, | |
| journal = {GitHub repository}, | |
| howpublished = {\url{https://github.com/LucasAlegre/sumo-rl}}, | |
| } | |
| ``` | |
| ## License | |
| This integration is licensed under the BSD-style license. SUMO-RL and SUMO have their own licenses. | |