title: ACDE OpenEnv
emoji: 🚑
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
base_path: /web
Emergency Routing Simulation (ACDE)
This project is a simulation environment for emergency ambulance routing.
In simple terms:
- A patient needs urgent care.
- Several hospitals are available.
- Each hospital has trade-offs (distance, traffic, ICU certainty, specialization).
- Conditions can change while the ambulance is moving.
- The agent must decide where to go, step by step.
The goal is not to be perfect every time. The goal is to make realistic decisions under uncertainty.
What This Project Does
This environment helps you test decision logic in situations where information is incomplete and time is limited.
It supports three difficulty levels:
acde_easyacde_mediumacde_hard
As difficulty increases, uncertainty and penalties increase too.
How It Works (Simple Flow)
Every episode follows this loop:
- The environment is reset with a seed and task.
- You get an observation:
- Patient condition
- Required specialization
- Hospital list with visible signals
- The policy scores hospitals.
- One hospital is selected.
- The environment validates arrival using hidden checks.
- You receive outcome + reward.
- If not done, repeat until success or failure.
What Makes It Realistic
This is not a static lookup problem. It includes realistic uncertainty:
- Displayed ICU status can differ from actual ICU status.
- Traffic can change between steps.
- Hospital overload can change outcomes.
- Specialist availability can fail at arrival.
- A hospital that failed once may become usable later.
The policy includes safety rules such as:
- Immediate retry protection after rejection.
- Cooldown handling for recently failed hospitals.
- Exploration among top options (not blind random picks).
Project Layout
Key files:
app/environment/core.py- Main environment loop (
reset,step, transition logic)
- Main environment loop (
app/environment/validation.py- Hidden validation checks (ICU, specialist, overload, outcome)
app/environment/graders.py- Final scoring and pass/fail grading
app/models/- Pydantic models for state, observation, reward, action
app/server/app.py- FastAPI server endpoints
inference.py- Local policy runner (CLI episodes)
data/learning_memory.json- Rolling policy memory
data/trajectory_history.jsonl- Per-step trajectory logs
API Endpoints
When server mode is running:
GET /healthPOST /resetPOST /stepGET /state
Action Space
The agent sends one action per step as JSON:
{
"step": 1,
"hospital_id": "H3",
"rationale": "short decision reason"
}
Action fields:
step(int, >=1): must match current environment stephospital_id(str): target hospital identifierrationale(str, optional): policy explanation
Observation Space
Each reset() and step() returns an observation with:
- episode metadata:
episode_id,seed,task_id,scenario_name,scenario_difficulty - patient state:
patient_condition,required_specialization, remaining time fields - hospital list:
hospital_id,distance_km,icu,specialization,traffic - routing history: visited/failed hospitals and failure reasons
- hidden-state feedback:
last_arrival_outcomesummary (status/reason/suitability) - memory snapshot used by the baseline policy
Core schema is defined by Pydantic models in:
app/models/action.pyapp/models/observation.pyapp/models/state.pyapp/models/reward.py
Required Environment Variables
Before running inference.py, define:
API_BASE_URL: API base URL for the OpenAI-compatible endpointMODEL_NAME: model name used for rationale generationHF_TOKEN: API key/token
Windows PowerShell example:
$env:API_BASE_URL = "https://api-inference.huggingface.co/v1"
$env:MODEL_NAME = "your-model-id"
$env:HF_TOKEN = "your-token"
Installation
1) Prerequisites
- Python 3.10+ (3.12 works)
pip
2) Open a terminal in this folder
Folder should be:
my_env
3) Create and activate a virtual environment (recommended)
Windows PowerShell:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
macOS/Linux:
python -m venv .venv
source .venv/bin/activate
4) Install dependencies
pip install -e .
If editable install is not needed:
pip install .
Running the Project
Option A: Run policy episodes directly (most common)
Run one medium episode:
python inference.py --mode single --task acde_medium --episodes 1 --seed 555
Run 10 hard episodes:
python inference.py --mode single --task acde_hard --episodes 10 --seed 555
Run all levels in sequence:
python inference.py --mode full --episodes 3 --seed 555
If you run without --task, the script asks for level interactively.
Option B: Run as HTTP service
Start API server:
uvicorn app.server.app:app --host 0.0.0.0 --port 7860
Health check:
curl http://127.0.0.1:7860/health
Understanding Output
During inference.py runs, you will see:
- Scenario details
- Hospital options and scores
- Decision strategy text
- Outcome per step (
ACCEPTED,PARTIAL,REJECTED) - Final episode summary
- Batch summary (success rate, average score, average steps)
Example summary:
Batch summary:
Success rate: 20.0%
Average score: 0.39
Average steps: 3.6
Data Files
The simulation writes data to data/:
learning_memory.json- Long-term policy memory
trajectory_history.jsonl- One JSON object per step
learning_archive.json- Aggregate run history and profiles
If you want a clean run baseline, back up and clear these files.
Typical Targets (Guideline)
These are practical targets, not strict rules:
- Easy: usually high success, often fewer steps
- Medium: mixed outcomes with meaningful rerouting
- Hard: lower success, more failures, more steps
If hard success is too high, increase uncertainty or rejection pressure. If hard success is too low, ease one or two hard-only probabilities.
Troubleshooting
"NameError" or model field errors
Make sure model fields and observation fields match after logic changes. If you added new state keys, also add them in observation models.
Script asks for seed/level unexpectedly
Pass flags explicitly:
python inference.py --mode single --task acde_hard --episodes 10 --seed 555
No module named app
Run commands from inside my_env folder, and ensure install succeeded:
pip install -e .
Uvicorn command not found
Install server deps in your active environment:
pip install uvicorn fastapi
Notes
- This project is designed for iterative policy tuning.
- Small changes in hard-mode probabilities can noticeably shift success rates.
- Always test with at least 10-30 episodes before concluding behavior changes.