| ---
|
| title: ACDE OpenEnv
|
| emoji: "🚑"
|
| colorFrom: blue
|
| colorTo: green
|
| sdk: docker
|
| pinned: false
|
| base_path: /web
|
| ---
|
|
|
| # Emergency Routing Simulation (ACDE)
|
|
|
| This project is a simulation environment for emergency ambulance routing.
|
|
|
| In simple terms:
|
| - A patient needs urgent care.
|
| - Several hospitals are available.
|
| - Each hospital has trade-offs (distance, traffic, ICU certainty, specialization).
|
| - Conditions can change while the ambulance is moving.
|
| - The agent must decide where to go, step by step.
|
|
|
| The goal is not to be perfect every time. The goal is to make realistic decisions under uncertainty.
|
|
|
| ## What This Project Does
|
|
|
| This environment helps you test decision logic in situations where information is incomplete and time is limited.
|
|
|
| It supports three difficulty levels:
|
| - `acde_easy`
|
| - `acde_medium`
|
| - `acde_hard`
|
|
|
| As difficulty increases, uncertainty and penalties increase too.
|
|
|
| ## How It Works (Simple Flow)
|
|
|
| Every episode follows this loop:
|
|
|
| 1. The environment is reset with a seed and task.
|
| 2. You get an observation:
|
| - Patient condition
|
| - Required specialization
|
| - Hospital list with visible signals
|
| 3. The policy scores hospitals.
|
| 4. One hospital is selected.
|
| 5. The environment validates arrival using hidden checks.
|
| 6. You receive outcome + reward.
|
| 7. If not done, repeat until success or failure.
|
|
|
| ## What Makes It Realistic
|
|
|
| This is not a static lookup problem. It includes realistic uncertainty:
|
|
|
| - Displayed ICU status can differ from actual ICU status.
|
| - Traffic can change between steps.
|
| - Hospital overload can change outcomes.
|
| - Specialist availability can fail at arrival.
|
| - A hospital that failed once may become usable later.
|
|
|
| The policy includes safety rules such as:
|
| - Immediate retry protection after rejection.
|
| - Cooldown handling for recently failed hospitals.
|
| - Exploration among top options (not blind random picks).
|
|
|
| ## Project Layout
|
|
|
| Key files:
|
|
|
| - `app/environment/core.py`
|
| - Main environment loop (`reset`, `step`, transition logic)
|
| - `app/environment/validation.py`
|
| - Hidden validation checks (ICU, specialist, overload, outcome)
|
| - `app/environment/graders.py`
|
| - Final scoring and pass/fail grading
|
| - `app/models/`
|
| - Pydantic models for state, observation, reward, action
|
| - `app/server/app.py`
|
| - FastAPI server endpoints
|
| - `inference.py`
|
| - Local policy runner (CLI episodes)
|
| - `data/learning_memory.json`
|
| - Rolling policy memory
|
| - `data/trajectory_history.jsonl`
|
| - Per-step trajectory logs
|
|
|
| ## API Endpoints
|
|
|
| When server mode is running:
|
|
|
| - `GET /health`
|
| - `POST /reset`
|
| - `POST /step`
|
| - `GET /state`
|
|
|
| ## Action Space
|
|
|
| The agent sends one action per step as JSON:
|
|
|
| ```json
|
| {
|
| "step": 1,
|
| "hospital_id": "H3",
|
| "rationale": "short decision reason"
|
| }
|
| ```
|
|
|
| Action fields:
|
| - `step` (int, >=1): must match current environment step
|
| - `hospital_id` (str): target hospital identifier
|
| - `rationale` (str, optional): policy explanation
|
|
|
| ## Observation Space
|
|
|
| Each `reset()` and `step()` returns an observation with:
|
| - episode metadata: `episode_id`, `seed`, `task_id`, `scenario_name`, `scenario_difficulty`
|
| - patient state: `patient_condition`, `required_specialization`, remaining time fields
|
| - hospital list: `hospital_id`, `distance_km`, `icu`, `specialization`, `traffic`
|
| - routing history: visited/failed hospitals and failure reasons
|
| - hidden-state feedback: `last_arrival_outcome` summary (status/reason/suitability)
|
| - memory snapshot used by the baseline policy
|
|
|
| Core schema is defined by Pydantic models in:
|
| - `app/models/action.py`
|
| - `app/models/observation.py`
|
| - `app/models/state.py`
|
| - `app/models/reward.py`
|
|
|
| ## Required Environment Variables
|
|
|
| Before running `inference.py`, define:
|
| - `API_BASE_URL`: API base URL for the OpenAI-compatible endpoint
|
| - `MODEL_NAME`: model name used for rationale generation
|
| - `HF_TOKEN`: API key/token
|
|
|
| Windows PowerShell example:
|
|
|
| ```powershell
|
| $env:API_BASE_URL = "https://api-inference.huggingface.co/v1"
|
| $env:MODEL_NAME = "your-model-id"
|
| $env:HF_TOKEN = "your-token"
|
| ```
|
|
|
| ## Installation
|
|
|
| ## 1) Prerequisites
|
|
|
| - Python 3.10+ (3.12 works)
|
| - `pip`
|
|
|
| ## 2) Open a terminal in this folder
|
|
|
| Folder should be:
|
| - `my_env`
|
|
|
| ## 3) Create and activate a virtual environment (recommended)
|
|
|
| Windows PowerShell:
|
|
|
| ```powershell
|
| python -m venv .venv
|
| .\.venv\Scripts\Activate.ps1
|
| ```
|
|
|
| macOS/Linux:
|
|
|
| ```bash
|
| python -m venv .venv
|
| source .venv/bin/activate
|
| ```
|
|
|
| ## 4) Install dependencies
|
|
|
| ```bash
|
| pip install -e .
|
| ```
|
|
|
| If editable install is not needed:
|
|
|
| ```bash
|
| pip install .
|
| ```
|
|
|
| ## Running the Project
|
|
|
| ## Option A: Run policy episodes directly (most common)
|
|
|
| Run one medium episode:
|
|
|
| ```bash
|
| python inference.py --mode single --task acde_medium --episodes 1 --seed 555
|
| ```
|
|
|
| Run 10 hard episodes:
|
|
|
| ```bash
|
| python inference.py --mode single --task acde_hard --episodes 10 --seed 555
|
| ```
|
|
|
| Run all levels in sequence:
|
|
|
| ```bash
|
| python inference.py --mode full --episodes 3 --seed 555
|
| ```
|
|
|
| If you run without `--task`, the script asks for level interactively.
|
|
|
| ## Option B: Run as HTTP service
|
|
|
| Start API server:
|
|
|
| ```bash
|
| uvicorn app.server.app:app --host 0.0.0.0 --port 7860
|
| ```
|
|
|
| Health check:
|
|
|
| ```bash
|
| curl http://127.0.0.1:7860/health
|
| ```
|
|
|
| ## Understanding Output
|
|
|
| During `inference.py` runs, you will see:
|
|
|
| - Scenario details
|
| - Hospital options and scores
|
| - Decision strategy text
|
| - Outcome per step (`ACCEPTED`, `PARTIAL`, `REJECTED`)
|
| - Final episode summary
|
| - Batch summary (success rate, average score, average steps)
|
|
|
| Example summary:
|
|
|
| ```text
|
| Batch summary:
|
| Success rate: 20.0%
|
| Average score: 0.39
|
| Average steps: 3.6
|
| ```
|
|
|
| ## Data Files
|
|
|
| The simulation writes data to `data/`:
|
|
|
| - `learning_memory.json`
|
| - Long-term policy memory
|
| - `trajectory_history.jsonl`
|
| - One JSON object per step
|
| - `learning_archive.json`
|
| - Aggregate run history and profiles
|
|
|
| If you want a clean run baseline, back up and clear these files.
|
|
|
| ## Typical Targets (Guideline)
|
|
|
| These are practical targets, not strict rules:
|
|
|
| - Easy: usually high success, often fewer steps
|
| - Medium: mixed outcomes with meaningful rerouting
|
| - Hard: lower success, more failures, more steps
|
|
|
| If hard success is too high, increase uncertainty or rejection pressure.
|
| If hard success is too low, ease one or two hard-only probabilities.
|
|
|
| ## Troubleshooting
|
|
|
| ## "NameError" or model field errors
|
|
|
| Make sure model fields and observation fields match after logic changes.
|
| If you added new state keys, also add them in observation models.
|
|
|
| ## Script asks for seed/level unexpectedly
|
|
|
| Pass flags explicitly:
|
|
|
| ```bash
|
| python inference.py --mode single --task acde_hard --episodes 10 --seed 555
|
| ```
|
|
|
| ## No module named app
|
|
|
| Run commands from inside `my_env` folder, and ensure install succeeded:
|
|
|
| ```bash
|
| pip install -e .
|
| ```
|
|
|
| ## Uvicorn command not found
|
|
|
| Install server deps in your active environment:
|
|
|
| ```bash
|
| pip install uvicorn fastapi
|
| ```
|
|
|
| ## Notes
|
|
|
| - This project is designed for iterative policy tuning.
|
| - Small changes in hard-mode probabilities can noticeably shift success rates.
|
| - Always test with at least 10-30 episodes before concluding behavior changes.
|
|
|