Spaces:
Sleeping
Sleeping
| title: LifeLine AI | |
| emoji: "🏥" | |
| colorFrom: blue | |
| colorTo: red | |
| sdk: docker | |
| pinned: false | |
| # MediRoute OpenEnv | |
| **MediRoute OpenEnv** is a deterministic **healthcare triage + hospital routing** simulation environment designed for evaluating agent decision-making under realistic clinical constraints. | |
| It models the end-to-end flow a real triage system must handle: | |
| - interpret symptoms + vitals/labs | |
| - assign severity (non-emergency → critical) | |
| - route to the right specialist | |
| - pick an appropriate nearby facility | |
| - decide between **appointment vs ambulance escalation** | |
| This environment is intentionally small, fully deterministic, and strongly typed so it can be used in hackathon evaluation pipelines and reproduced exactly. | |
| --- | |
| ## Why this matters (motivation + utility) | |
| Healthcare triage is a high-stakes planning problem with: | |
| - **multi-step reasoning** (severity → specialist → facility → action) | |
| - **safety-critical escalation** (ambulance dispatch vs harmful delays) | |
| - **real-world constraints** (limited specialists, nearby hospitals, and incomplete info) | |
| MediRoute is useful for agent evaluation because it tests: | |
| - **trajectory quality** (progressive reward shaping across steps) | |
| - **loop avoidance** (duplicate actions and stalling are penalized) | |
| - **robustness** (invalid actions are handled safely and deterministically) | |
| - **policy compliance** (terminal actions and episode boundaries are enforced) | |
| --- | |
| ## Environment overview | |
| - **Environment class**: `MediRouteEnv` in `environment.py` | |
| - **Spec**: `openenv.yaml` | |
| - **Typed interface**: `models.py` (Pydantic `Observation`, `Action`, `StepResult`) | |
| - **Tasks**: `tasks.py` (`easy`, `medium`, `hard`) | |
| - **Deterministic graders**: `graders.py` (`grade_step`, `grade_episode`) | |
| OpenEnv interface methods: | |
| - `reset(difficulty: str) -> Observation` | |
| - `step(action: Action) -> StepResult` where `StepResult` contains: | |
| - `observation` (updated `Observation`) | |
| - `reward` (incremental step reward) | |
| - `done` (episode termination flag) | |
| - `info` (diagnostics incl. totals and termination reason) | |
| - `state() -> Observation` (read-only snapshot) | |
| --- | |
| ## Tasks (real-world healthcare cases) | |
| The tasks represent increasing clinical risk and decision complexity. | |
| ### Easy — mild illness (primary care) | |
| - **Scenario**: fever + sore throat with positive strep test | |
| - **Goal**: classify **low** severity, route to **General Physician**, choose an appropriate clinic, then close with appointment/guidance | |
| - **Clinical realism**: routine outpatient triage with lab confirmation | |
| ### Medium — suspected acute coronary syndrome | |
| - **Scenario**: crushing chest pain, hypertension, ECG ST-elevation, elevated troponin | |
| - **Goal**: classify **high** severity, route to **Cardiologist**, select a cardiac-capable hospital, then close appropriately | |
| - **Clinical realism**: time-sensitive cardiology routing | |
| ### Hard — critical collapse (life-threatening) | |
| - **Scenario**: unresponsive patient with cyanosis and SpO₂ crash | |
| - **Goal**: classify **critical** severity and **dispatch ambulance** (terminal action), avoiding unsafe appointment flows | |
| - **Clinical realism**: emergency escalation with irreversible harm from delay | |
| --- | |
| ## Action space | |
| Defined in `models.py` (`VALID_ACTION_TYPES`) and mirrored in `openenv.yaml`: | |
| - `analyze_symptoms` — classify severity (target: `low|moderate|high|critical`) | |
| - `request_more_info` — ask for missing details (target optional) | |
| - `recommend_specialist` — choose specialist (target: a specialist name) | |
| - `select_hospital` — choose facility (target: a hospital name) | |
| - `book_appointment` — close non-emergencies (target optional) | |
| - `call_ambulance` — escalate emergencies (target optional) | |
| - `provide_temp_guidance` — short-term guidance (target optional) | |
| --- | |
| ## Observation space | |
| `Observation` fields (see `models.py` and `openenv.yaml`): | |
| - `symptoms: str` | |
| - `lab_report_summary: dict` | |
| - `severity_score: float` in `[0.0, 1.0]` (updated when severity is analyzed) | |
| - `location: str` | |
| - `nearby_hospitals: list[str]` | |
| - `available_specialists: list[str]` | |
| - `previous_actions: list[str]` (canonical `"<action_type>:<target>"`) | |
| # MediRoute OpenEnv | |
| **MediRoute OpenEnv** is a deterministic **healthcare triage + hospital routing** simulation environment designed for evaluating agent decision-making under realistic clinical constraints. | |
| It models the end-to-end flow a real triage system must handle: | |
| - interpret symptoms + vitals/labs | |
| - assign severity (non-emergency → critical) | |
| - route to the right specialist | |
| - pick an appropriate nearby facility | |
| - decide between **appointment vs ambulance escalation** | |
| This environment is intentionally small, fully deterministic, and strongly typed so it can be used in hackathon evaluation pipelines and reproduced exactly. | |
| --- | |
| ## Configuration | |
| This project exposes several environment variables used at runtime. Keep sensitive keys server-side and out of client-side code (e.g., do not expose `GEOCODER_API_KEY` or `OPENAI_API_KEY` to the browser). | |
| Important environment variables: | |
| - `OPENAI_API_KEY` — (optional) API key for OpenAI if you use the LLM baseline or OpenAI-backed inference. | |
| - `HF_TOKEN` — (optional) Hugging Face token for gated HF models. | |
| - `API_BASE_URL` — (optional) override for OpenAI-compatible endpoints. | |
| - `MODEL_NAME` — (optional) model name to use for LLM inference (default: `gpt-4o-mini` in examples). | |
| - `USE_LOCAL_EMBEDDINGS` — (optional) set to `1`/`true` to enable sentence-transformers fallback for `analyze` when a cloud key is not present. | |
| - `EMBEDDING_MODEL` — (optional) sentence-transformers model id (e.g., `all-MiniLM-L6-v2`) used by local embeddings fallback. | |
| - `GEOCODER_PROVIDER` — (optional) `nominatim` (default) or `mapbox` or `google` if implemented; the server will use this to select reverse geocoding provider. | |
| - `GEOCODER_API_KEY` — (required if using a paid provider) API key for the chosen geocoding provider; keep this server-side and set it as an environment variable or secret. | |
| - `NEXT_PUBLIC_API_BASE` — (frontend) base URL for the backend API; this can point to `http://localhost:8000` in development. Avoid putting secret keys in `NEXT_PUBLIC_` vars. | |
| Example `.env` (for local development) — do NOT commit this file into git: | |
| ```env | |
| # .env.local (example) | |
| OPENAI_API_KEY="" | |
| HF_TOKEN="" | |
| USE_LOCAL_EMBEDDINGS=1 | |
| EMBEDDING_MODEL="all-MiniLM-L6-v2" | |
| GEOCODER_PROVIDER=nominatim | |
| # GEOCODER_API_KEY="your_mapbox_or_google_key" | |
| NEXT_PUBLIC_API_BASE="http://localhost:8000" | |
| ``` | |
| Docker example (passing keys at runtime): | |
| ```bash | |
| docker run --rm -e GEOCODER_PROVIDER=mapbox -e GEOCODER_API_KEY="$MAPBOX_KEY" -e OPENAI_API_KEY="$OPENAI_KEY" -p 8000:8000 mediroute-openenv:latest | |
| ``` | |
| Notes: | |
| - Nominatim (OpenStreetMap) is supported by default for reverse geocoding but has usage limits and a usage policy — for production use consider Mapbox or Google and set `GEOCODER_API_KEY` accordingly. | |
| - Keep API keys on the server. The frontend should call your server endpoints (e.g., `/reverse-geocode`) rather than calling external providers directly. | |
| --- | |
| ## Why this matters (motivation + utility) | |
| Healthcare triage is a high-stakes planning problem with: | |
| - **multi-step reasoning** (severity → specialist → facility → action) | |
| - **safety-critical escalation** (ambulance dispatch vs harmful delays) | |
| - **real-world constraints** (limited specialists, nearby hospitals, and incomplete info) | |
| MediRoute is useful for agent evaluation because it tests: | |
| - **trajectory quality** (progressive reward shaping across steps) | |
| - **loop avoidance** (duplicate actions and stalling are penalized) | |
| - **robustness** (invalid actions are handled safely and deterministically) | |
| - **policy compliance** (terminal actions and episode boundaries are enforced) | |
| --- | |
| ## Environment overview | |
| - **Environment class**: `MediRouteEnv` in `environment.py` | |
| - **Spec**: `openenv.yaml` | |
| - **Typed interface**: `models.py` (Pydantic `Observation`, `Action`, `StepResult`) | |
| - **Tasks**: `tasks.py` (`easy`, `medium`, `hard`) | |
| - **Deterministic graders**: `graders.py` (`grade_step`, `grade_episode`) | |
| OpenEnv interface methods: | |
| - `reset(difficulty: str) -> Observation` | |
| - `step(action: Action) -> StepResult` where `StepResult` contains: | |
| - `observation` (updated `Observation`) | |
| - `reward` (incremental step reward) | |
| - `done` (episode termination flag) | |
| - `info` (diagnostics incl. totals and termination reason) | |
| - `state() -> Observation` (read-only snapshot) | |
| --- | |
| ## Tasks (real-world healthcare cases) | |
| The tasks represent increasing clinical risk and decision complexity. | |
| ### Easy — mild illness (primary care) | |
| - **Scenario**: fever + sore throat with positive strep test | |
| - **Goal**: classify **low** severity, route to **General Physician**, choose an appropriate clinic, then close with appointment/guidance | |
| - **Clinical realism**: routine outpatient triage with lab confirmation | |
| ### Medium — suspected acute coronary syndrome | |
| - **Scenario**: crushing chest pain, hypertension, ECG ST-elevation, elevated troponin | |
| - **Goal**: classify **high** severity, route to **Cardiologist**, select a cardiac-capable hospital, then close appropriately | |
| - **Clinical realism**: time-sensitive cardiology routing | |
| ### Hard — critical collapse (life-threatening) | |
| - **Scenario**: unresponsive patient with cyanosis and SpO₂ crash | |
| - **Goal**: classify **critical** severity and **dispatch ambulance** (terminal action), avoiding unsafe appointment flows | |
| - **Clinical realism**: emergency escalation with irreversible harm from delay | |
| --- | |
| ## Action space | |
| Defined in `models.py` (`VALID_ACTION_TYPES`) and mirrored in `openenv.yaml`: | |
| - `analyze_symptoms` — classify severity (target: `low|moderate|high|critical`) | |
| - `request_more_info` — ask for missing details (target optional) | |
| - `recommend_specialist` — choose specialist (target: a specialist name) | |
| - `select_hospital` — choose facility (target: a hospital name) | |
| - `book_appointment` — close non-emergencies (target optional) | |
| - `call_ambulance` — escalate emergencies (target optional) | |
| - `provide_temp_guidance` — short-term guidance (target optional) | |
| --- | |
| ## Observation space | |
| `Observation` fields (see `models.py` and `openenv.yaml`): | |
| - `symptoms: str` | |
| - `lab_report_summary: dict` | |
| - `severity_score: float` in `[0.0, 1.0]` (updated when severity is analyzed) | |
| - `location: str` | |
| - `nearby_hospitals: list[str]` | |
| - `available_specialists: list[str]` | |
| - `previous_actions: list[str]` (canonical `"<action_type>:<target>"`) | |
| --- | |
| ## Reward shaping (non-binary, trajectory-based) | |
| Reward is **shaped across the trajectory** (not a single binary outcome): | |
| - partial credit for intermediate correct decisions (severity, specialist, hospital) | |
| - penalties for unsafe or unproductive behavior (wrong routing, duplicates, stalling) | |
| - episode total is clamped to `[0.0, 1.0]` for consistent scoring | |
| Implementation: | |
| - per-step reward: `graders.grade_step(task, action, previous_actions)` | |
| - episode summary: `graders.grade_episode(...)` | |
| - total reward clamped + tracked in `environment.py` | |
| --- | |
| ## Setup | |
| ### Local (Python) | |
| ```bash | |
| cd meta | |
| python -m venv .venv | |
| source .venv/bin/activate | |
| pip install -r requirements.txt | |
| ``` | |
| --- | |
| ## Run the environment | |
| ### Interactive REPL (manual testing) | |
| ```bash | |
| cd meta | |
| python app.py --difficulty easy | |
| ``` | |
| ### Baseline inference (LLM agent) | |
| Environment variables: | |
| - `OPENAI_API_KEY` (or `HF_TOKEN` for gated HF models) | |
| - `API_BASE_URL` (defaults to OpenAI; can be any OpenAI-compatible server) | |
| - `MODEL_NAME` (defaults to `gpt-4o-mini`) | |
| ```bash | |
| cd meta | |
| export OPENAI_API_KEY="..." | |
| export API_BASE_URL="https://api.openai.com/v1" | |
| export MODEL_NAME="gpt-4o-mini" | |
| python inference.py --difficulty all --agent llm | |
| ``` | |
| ### Baseline inference (deterministic rules agent) | |
| This baseline runs **without any network calls** and is fully reproducible. | |
| ```bash | |
| cd meta | |
| python inference.py --difficulty all --agent rules | |
| ``` | |
| --- | |
| ## Expected baseline scores | |
| Because the environment and grader are deterministic: | |
| - **Rules baseline** (`--agent rules`) is expected to score **1.0000** on `easy`, `medium`, and `hard`. | |
| - **LLM baseline** (`--agent llm`) depends on the chosen model/endpoint, but should typically pass all tasks with a capable instruction-following model. | |
| --- | |
| ## Docker (build + run) | |
| ### Build | |
| ```bash | |
| cd meta | |
| docker build -t mediroute-openenv:latest . | |
| ``` | |
| ### Run (rules baseline, no API required) | |
| ```bash | |
| docker run --rm mediroute-openenv:latest python -u inference.py --difficulty all --agent rules | |
| ``` | |
| ### Run (LLM baseline) | |
| ```bash | |
| docker run --rm \ | |
| -e OPENAI_API_KEY="..." \ | |
| -e API_BASE_URL="https://api.openai.com/v1" \ | |
| -e MODEL_NAME="gpt-4o-mini" \ | |
| mediroute-openenv:latest python -u inference.py --difficulty all --agent llm | |
| ``` | |
| --- | |
| ## Hugging Face Spaces (CPU) deployment notes | |
| MediRoute is HF-Spaces-friendly because it is **CPU-only** and can run fully offline using the rules baseline. | |
| Recommended Space setup: | |
| - **SDK**: Docker (or Python, but Docker is easiest) | |
| - **Hardware**: CPU basic | |
| - **Entrypoint**: keep the default `CMD` (runs all tasks), or override to rules mode | |
| If using Docker Spaces: | |
| - add secrets as needed (`OPENAI_API_KEY` / `HF_TOKEN`) | |
| - optionally set `MODEL_NAME` and `API_BASE_URL` for your endpoint | |
| To default the Space to offline evaluation: | |
| - configure it to run: `python -u inference.py --difficulty all --agent rules` | |
| --- | |
| ## Novelty (why this is different) | |
| Compared to common OpenEnv tasks (email triage, scheduling, simple classification), MediRoute is novel because it combines: | |
| - **safety-critical escalation** (ambulance dispatch logic, harmful appointment decisions) | |
| - **severity inference → downstream routing** (specialist + hospital choice depends on severity) | |
| - **trajectory shaping** that rewards incremental clinical reasoning and penalizes loops | |
| - **healthcare-specific realism** (vitals/labs, STEMI-like signals, SpO₂ collapse) | |
| --- | |
| ## Repo map | |
| - `environment.py` — OpenEnv environment implementation (`reset/step/state`) | |
| - `models.py` — Pydantic models (`Observation`, `Action`, `StepResult`) | |
| - `tasks.py` — deterministic tasks (`easy|medium|hard`) | |
| - `graders.py` — deterministic reward shaping and episode grading | |
| - `inference.py` — baseline inference runner (`--agent llm|rules`) | |
| - `app.py` — manual interactive REPL | |
| - `openenv.yaml` — OpenEnv specification | |