---
title: ACDE OpenEnv
emoji: "🚑"
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
base_path: /web
---

# Emergency Routing Simulation (ACDE)

This project is a simulation environment for emergency ambulance routing.

In simple terms:
- A patient needs urgent care.
- Several hospitals are available.
- Each hospital has trade-offs (distance, traffic, ICU certainty, specialization).
- Conditions can change while the ambulance is moving.
- The agent must decide where to go, step by step.

The goal is not to be perfect every time. The goal is to make realistic decisions under uncertainty.

## What This Project Does

This environment helps you test decision logic in situations where information is incomplete and time is limited.

It supports three difficulty levels:
- `acde_easy`
- `acde_medium`
- `acde_hard`

As difficulty increases, uncertainty and penalties increase too.

## How It Works (Simple Flow)

Every episode follows this loop:

1. The environment is reset with a seed and task.
2. You get an observation:
- Patient condition
- Required specialization
- Hospital list with visible signals
3. The policy scores hospitals.
4. One hospital is selected.
5. The environment validates arrival using hidden checks.
6. You receive outcome + reward.
7. If not done, repeat until success or failure.

## What Makes It Realistic

This is not a static lookup problem. It includes realistic uncertainty:

- Displayed ICU status can differ from actual ICU status.
- Traffic can change between steps.
- Hospital overload can change outcomes.
- Specialist availability can fail at arrival.
- A hospital that failed once may become usable later.

The policy includes safety rules such as:
- Immediate retry protection after rejection.
- Cooldown handling for recently failed hospitals.
- Exploration among top options (not blind random picks).

## Project Layout

Key files:

- `app/environment/core.py`
  - Main environment loop (`reset`, `step`, transition logic)
- `app/environment/validation.py`
  - Hidden validation checks (ICU, specialist, overload, outcome)
- `app/environment/graders.py`
  - Final scoring and pass/fail grading
- `app/models/`
  - Pydantic models for state, observation, reward, action
- `app/server/app.py`
  - FastAPI server endpoints
- `inference.py`
  - Local policy runner (CLI episodes)
- `data/learning_memory.json`
  - Rolling policy memory
- `data/trajectory_history.jsonl`
  - Per-step trajectory logs

## API Endpoints

When server mode is running:

- `GET /health`
- `POST /reset`
- `POST /step`
- `GET /state`

## Action Space

The agent sends one action per step as JSON:

```json
{
  "step": 1,
  "hospital_id": "H3",
  "rationale": "short decision reason"
}
```

Action fields:
- `step` (int, >=1): must match current environment step
- `hospital_id` (str): target hospital identifier
- `rationale` (str, optional): policy explanation

## Observation Space

Each `reset()` and `step()` returns an observation with:
- episode metadata: `episode_id`, `seed`, `task_id`, `scenario_name`, `scenario_difficulty`
- patient state: `patient_condition`, `required_specialization`, remaining time fields
- hospital list: `hospital_id`, `distance_km`, `icu`, `specialization`, `traffic`
- routing history: visited/failed hospitals and failure reasons
- hidden-state feedback: `last_arrival_outcome` summary (status/reason/suitability)
- memory snapshot used by the baseline policy

Core schema is defined by Pydantic models in:
- `app/models/action.py`
- `app/models/observation.py`
- `app/models/state.py`
- `app/models/reward.py`

## Required Environment Variables

Before running `inference.py`, define:
- `API_BASE_URL`: API base URL for the OpenAI-compatible endpoint
- `MODEL_NAME`: model name used for rationale generation
- `HF_TOKEN`: API key/token

Windows PowerShell example:

```powershell
$env:API_BASE_URL = "https://api-inference.huggingface.co/v1"
$env:MODEL_NAME = "your-model-id"
$env:HF_TOKEN = "your-token"
```

## Installation

## 1) Prerequisites

- Python 3.10+ (3.12 works)
- `pip`

## 2) Open a terminal in this folder

Folder should be:
- `my_env`

## 3) Create and activate a virtual environment (recommended)

Windows PowerShell:

```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
```

macOS/Linux:

```bash
python -m venv .venv
source .venv/bin/activate
```

## 4) Install dependencies

```bash
pip install -e .
```

If editable install is not needed:

```bash
pip install .
```

## Running the Project

## Option A: Run policy episodes directly (most common)

Run one medium episode:

```bash
python inference.py --mode single --task acde_medium --episodes 1 --seed 555
```

Run 10 hard episodes:

```bash
python inference.py --mode single --task acde_hard --episodes 10 --seed 555
```

Run all levels in sequence:

```bash
python inference.py --mode full --episodes 3 --seed 555
```

If you run without `--task`, the script asks for level interactively.

## Option B: Run as HTTP service

Start API server:

```bash
uvicorn app.server.app:app --host 0.0.0.0 --port 7860
```

Health check:

```bash
curl http://127.0.0.1:7860/health
```

## Understanding Output

During `inference.py` runs, you will see:

- Scenario details
- Hospital options and scores
- Decision strategy text
- Outcome per step (`ACCEPTED`, `PARTIAL`, `REJECTED`)
- Final episode summary
- Batch summary (success rate, average score, average steps)

Example summary:

```text
Batch summary:
  Success rate: 20.0%
  Average score: 0.39
  Average steps: 3.6
```

## Data Files

The simulation writes data to `data/`:

- `learning_memory.json`
  - Long-term policy memory
- `trajectory_history.jsonl`
  - One JSON object per step
- `learning_archive.json`
  - Aggregate run history and profiles

If you want a clean run baseline, back up and clear these files.

## Typical Targets (Guideline)

These are practical targets, not strict rules:

- Easy: usually high success, often fewer steps
- Medium: mixed outcomes with meaningful rerouting
- Hard: lower success, more failures, more steps

If hard success is too high, increase uncertainty or rejection pressure.
If hard success is too low, ease one or two hard-only probabilities.

## Troubleshooting

## "NameError" or model field errors

Make sure model fields and observation fields match after logic changes.
If you added new state keys, also add them in observation models.

## Script asks for seed/level unexpectedly

Pass flags explicitly:

```bash
python inference.py --mode single --task acde_hard --episodes 10 --seed 555
```

## No module named app

Run commands from inside `my_env` folder, and ensure install succeeded:

```bash
pip install -e .
```

## Uvicorn command not found

Install server deps in your active environment:

```bash
pip install uvicorn fastapi
```

## Notes

- This project is designed for iterative policy tuning.
- Small changes in hard-mode probabilities can noticeably shift success rates.
- Always test with at least 10-30 episodes before concluding behavior changes.