--- title: EHRGym emoji: πŸ₯ colorFrom: blue colorTo: green sdk: docker app_port: 7860 tags: - openenv - rl-environment - ehr - grpo - trl - clinical - computer-use pinned: false license: apache-2.0 --- # EHRGym

EHRGym Logo

OpenEnv TRL GRPO Open In Colab License

**EHRGym** is an [OpenEnv](https://huggingface.co/openenv-community)-compatible environment for training and evaluating computer-use agents in an Epic-like electronic health record (EHR) workflow. It integrates natively with [TRL](https://github.com/huggingface/trl)'s `GRPOTrainer` for GRPO fine-tuning.

EHRGym UI Demo

πŸ€— Try the environment out on Hugging Face Spaces

It combines: - A web-based EHR built with **Next.js + TypeScript** - An **OpenEnv-compliant environment server** built with **FastAPI + Playwright** The environment exposes `reset()`, `step(action)`, and a `state` object so an agent can interact with the EHR through a real browser. > **Note:** This project uses **synthetic data only** (no PHI). > Not affiliated with or endorsed by Epic Systems. --- ## Table of contents - [Clinical focus (initial)](#clinical-focus-initial) - [What you get](#what-you-get) - [Goals](#goals) - [Non-goals (initial)](#non-goals-initial) - [Architecture (one environment instance)](#architecture-one-environment-instance) - [EHR UI layout (Epic-like)](#ehr-ui-layout-epic-like) - [OpenEnv interface](#openenv-interface) - [Tasks (provider-focused)](#tasks-provider-focused) - [Synthetic patients](#synthetic-patients) - [Performance & training approach](#performance--training-approach) - [Logging & evaluation](#logging--evaluation) - [Repository layout (proposed)](#repository-layout-proposed) - [Quickstart (placeholder)](#quickstart-placeholder) - [GRPO Training with TRL](#grpo-training-with-trl) - [Contributing](#contributing) - [License](#license) --- ## Clinical focus (initial) Provider workflows: - Reviewing the chart (encounters, labs, prior notes) - Writing progress and encounter notes - Placing and signing orders --- ## What you get - **Epic-like charting UI** - Chart Review (Encounters / Labs / Clinical Notes) - Notes authoring - Orders with signing workflow - Encounter sign/close - **OpenEnv-compliant RL environment** - Typed `Action`, `Observation`, `State` - `reset()` / `step()` / `state()` - Real browser interaction (Playwright) - **Task library** - Chart review β†’ note β†’ orders β†’ sign/close - **Synthetic patient pipeline** - Baseline: **Synthea + FHIR-shaped ingest** --- ## Goals - OpenEnv compliance with typed `Action` / `Observation` / `State` models - Docker-first deployment and reproducible containers - Next.js EHR interface supporting: - chart review (encounters, labs, clinical notes) - order entry (labs / meds / imaging) with sign workflow - note authoring (progress & encounter notes) - Task-based RL episodes (patient + scenario + objective + scoring rubric) - Synthetic patients only (no PHI), with realistic longitudinal timelines and standard coding where feasible --- ## Out-of-Scope - Pixel-perfect Epic cloning (We emulate workflows & info layout) - Full enterprise EHR scope on day one (MAR, billing, scheduling, in-basket, prior auth, etc.) --- ## Architecture A single container runs two processes: 1. **Next.js EHR app (port 3000)** - Serves the UI and required API routes (patient data, notes, orders, signing) 2. **OpenEnv environment server (port 8000)** - FastAPI server exposing OpenEnv API - Launches and controls headless Chromium via Playwright - Implements `reset()`, `step()`, `state`, scenario sampling, and reward computation **Data layer** - SQLite via Prisma (portable and fast) - On `reset()`, the environment recreates/truncates the DB and reseeds patients, encounters, labs, notes, orders, and scenario ground truth. Optionally use a DB snapshot + copy-on-reset for speed. --- ## EHR UI layout (Epic-like) - **Entry view:** patient list / schedule-like page β†’ select patient β†’ open chart - **Chart shell** - Activity sidebar: Summary, Chart Review, Orders, Notes (optional), Encounter (close/sign) - Patient banner: synthetic demographics and key flags (synthetic ID, age/sex, allergies) - **Chart Review tabs** - Encounters: timeline, encounter detail, linked notes/orders - Labs: table + trend view, filtering, abnormal flags - Clinical Notes: list by type/date/author, open note - **Notes** - Create Progress Note tied to current encounter - Structured sections (SOAP) - Problem-oriented A/P that links naturally to orders - **Orders** - Search/select from constrained preference list - Configure parameters (dose/frequency, lab timing) - Statuses: Draft β†’ Pending Signature β†’ Signed **RL instrumentation** - Stable selectors (`data-testid` / `data-qa`) for tabs, lab rows, order rows, note controls - Accessible labels (`aria-label`) so agents can use the accessibility tree --- ## OpenEnv Interface **Actions** - Low-level computer-use actions (mouse clicks, drag, scroll, keypress, type, wait) - Optional high-level actions for curriculum/debug (e.g., `click(selector)`, `fill(selector,text)`, `goto(path)`, `select_patient(patient_id)`) **Observations** - Goal/instruction text - Downscaled screenshot (base64 PNG) - Current route/URL and active activity context - Optional DOM snapshot and/or accessibility tree - Metadata (timing, action success, structured errors) **State** - `episode_id`, `step_count` + environment fields: - `patient_id`, `encounter_id`, `scenario_id` - `rubric_progress` - `cumulative_reward` **Rewarding** - Terminal success when objective is satisfied (e.g., correct note signed + correct orders signed) - Shaping rewards for meaningful substeps (navigate, find target lab, place required order, sign) - Penalties for invalid actions, navigation errors, unsafe/irrelevant orders, excessive steps --- ## Tasks Scenarios are packaged as specs and optionally generated at reset. Example task families: - **Chart Review β†’ Labs** - Find most recent creatinine; evaluate AKI criteria - Trend hemoglobin over last 3 values; document in progress note - **Chart Review β†’ Encounters** - Locate discharge summary; extract follow-up plan - Identify prior antibiotic exposure from previous encounter orders - **Clinical Notes** - Open most recent consult; summarize recommendations - **Progress note authoring** - Complete SOAP note with required elements and grounded facts - **Orders** - Place specific orders with correct parameters; sign - **Close/finish encounter** - Signed note + signed orders + required fields **Curriculum** - Phase 0: unit skills (navigate, open tabs, filter labs, open note) - Phase 1: single objective (place one order, sign one note) - Phase 2: multi-step (review β†’ note β†’ orders β†’ sign/close) --- ## Synthetic patients Baseline approach: - Use Synthea to generate longitudinal synthetic records (encounters, conditions, meds, labs/vitals, procedures, etc.), exportable as FHIR - Treat FHIR R4 concepts as the internal β€œshape” even if stored relationally - Use standard coding when feasible: - LOINC for labs - SNOMED CT for problems/findings/procedures - RxNorm for meds **Notes gap (free-text)** - Template-based notes from structured facts (easy to score, less diverse) - Constrained LLM-generated notes grounded strictly in chart facts (more realistic, needs guardrails) - Hybrid: deterministic skeleton + constrained paraphrase **Scenarios** layer on top of base patients as teaching cases (e.g., DKA, CHF, pneumonia, AKI, GI bleed) with explicit ground truth objectives: - required orders - required note elements - critical facts that must appear in the note --- ## Performance and Training Approach - Browser simulation throughput is usually the bottleneck, not GPU - Start with demonstrations (scripted Playwright expert) β†’ supervised behavioral cloning - Move to RL after BC reliably solves simpler tasks - Run a modest number of env containers concurrently (e.g., 4–16) - Keep observations efficient (downscale screenshots; optionally omit DOM/a11y on β€œeasy mode”) --- ## Logging and Evaluation **Logging per step** - Action, success/failure, reward components, UI errors **Episode artifacts** - Final note text - Orders placed/signed - Optional screenshots for debugging **Evaluation** - Deterministic test suites with fixed seeds - Metrics: task success rate, steps-to-completion, unsafe/irrelevant order rate, note completeness/grounding **Safety** - Synthetic data only (no PHI) - Constrained formulary and order catalog - If LLM-generated notes are used, enforce grounding checks (facts must be supported by chart) --- ## Repository layout ``` apps/ehr/ Next.js EHR UI (TypeScript) ehrgym/ OpenEnv Python client + TRL reward functions notebooks/ Starter notebook for GRPO training env_server/ FastAPI OpenEnv server + Playwright control tasks/ scenario specs, rubrics, fixtures (25 tasks) configs/ GRPO training configs (YAML + DeepSpeed) scripts/ TRL training script, agents, trajectory tools prisma/ schema + migrations docker/ Dockerfiles + entrypoints shared/ synthetic seed definitions + reset helpers synthetic/ Synthea generation + FHIR ingest + seed tooling ``` --- ## Quickstart The initial scaffold is now wired end-to-end. ### What is included - **Next.js EHR UI** in [apps/ehr](apps/ehr) - patient list / chart entry - chart review with encounters, labs, notes - progress note authoring - order drafting and signing - encounter sign workflow - **FastAPI environment server** in [env_server](env_server) - `POST /reset` - `POST /step` - `GET /state` - `GET /healthz` - **Prisma + SQLite** schema and seed data in [prisma](prisma) and [shared](shared) - **Docker** single-container startup files in [docker](docker) and [docker-compose.yml](docker-compose.yml) ### Local development Prerequisites: - Node.js 20+ - Python 3.9+ 1. Install Node dependencies: `npm install` 2. Install the Python environment server package: `python3 -m pip install .` If you use a virtual environment or conda environment, activate it before running the remaining commands. 3. Install the browser runtime for Playwright: `python3 -m playwright install chromium` 4. Copy environment variables if needed: `cp .env.example .env` 5. Initialize the SQLite database: `npx prisma generate && npx prisma db push && npx prisma db seed` 6. Start both processes: `npm run dev` Available endpoints: - EHR UI: http://127.0.0.1:3000 - Env server: http://127.0.0.1:8000 ### Docker Build and run the combined container: `docker compose up --build` This launches: - the Next.js EHR app on port `3000` - the FastAPI environment server on port `8000` ### Minimal API flow 1. `POST /reset` 2. Read `observation` and `state` 3. Send browser-style actions to `POST /step` 4. Inspect `GET /state` for episode progress A starter agent loop is included in [scripts/example_agent.py](scripts/example_agent.py). ### Demo tooling For offline trajectory replay and remote VLM rollouts over SSH, see [docs/remote-vlm-demo.md](docs/remote-vlm-demo.md). For offline dataset creation and SFT preparation, see [docs/offline-training.md](docs/offline-training.md). --- ## GRPO Training with TRL EHRGym integrates with TRL's `GRPOTrainer` using the [OpenEnv](https://huggingface.co/docs/trl/openenv) `rollout_func` pattern for agent training. The model learns to navigate the EHR, place orders, write notes, and sign encounters through multi-turn browser interaction. ### Starter notebook (recommended) The fastest way to get started is the end-to-end training notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adtserapio/EHRGym/blob/main/notebooks/ehrgym_grpo_training.ipynb) The notebook covers: - Connecting to the hosted EHRGym Space (zero setup) - Defining a `rollout_func` with `generate_rollout_completions` for multi-turn EHR interaction - Three reward signals: clinical rubric, action format, and step efficiency - Training with vLLM-accelerated GRPO on Qwen3-1.7B - Evaluating the fine-tuned model on clinical tasks See [`notebooks/ehrgym_grpo_training.ipynb`](notebooks/ehrgym_grpo_training.ipynb) for the full walkthrough. ### Quick start (CLI) ```bash # 1. Start the EHRGym environment npm run dev # 2. Install training dependencies pip install "trl[vllm]" git+https://github.com/adtserapio/EHRGym.git # 3. Run GRPO training (single GPU, smoke test) python scripts/train_grpo_trl.py \ --model_name_or_path Qwen/Qwen3-0.6B \ --output_dir runs/checkpoints/ehrgym-grpo-trl \ --max_steps 50 \ --num_generations 2 \ --max_completion_length 512 ``` ### With vLLM acceleration ```bash accelerate launch \ --config_file configs/deepspeed_zero2.yaml \ scripts/train_grpo_trl.py \ --model_name_or_path Qwen/Qwen3-1.7B \ --output_dir runs/checkpoints/ehrgym-grpo-trl \ --use_vllm True \ --vllm_mode colocate \ --max_steps 500 \ --num_generations 4 \ --max_completion_length 1024 \ --report_to wandb ``` ### Using the config file ```bash python scripts/train_grpo_trl.py --config configs/grpo_ehrgym.yaml ``` ### Python API (rollout_func pattern) ```python from ehrgym import EHRGymEnv from trl import GRPOTrainer, GRPOConfig from trl.experimental.openenv import generate_rollout_completions def rollout_func(prompts, trainer): # For each prompt, run a full EHR episode # Parse model outputs into browser actions (navigate, click, type, press) # Step through the environment and collect rewards # Return prompt_ids, completion_ids, logprobs, env_mask, and reward fields ... trainer = GRPOTrainer( model="Qwen/Qwen3-1.7B", reward_funcs=[reward_task, reward_format, reward_efficiency], train_dataset=dataset, args=GRPOConfig( max_completion_length=4096, use_vllm=True, vllm_mode="colocate", ), rollout_func=rollout_func, ) trainer.train() ``` For the complete `rollout_func` implementation with `env_mask` and multi-turn interaction, see the [starter notebook](notebooks/ehrgym_grpo_training.ipynb). ### Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Training (GPU Machine) β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ TRL GRPOTrainer β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚ Model β”‚β†’ β”‚ Tool Calls β”‚β†’ β”‚ EHRGymEnv β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ (Qwen3) │← β”‚ (navigate, │← β”‚ (HTTP β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ click, β”‚ β”‚ client) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ type_text,β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ press_key)β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ HTTP β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ EHRGym Server (Docker / HF Space) β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ FastAPI env server (:8000) β–Ό β”‚ β”‚ β”‚ β”‚ /reset /step /state β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ β”‚ β”‚ Playwright (headless Chromium) β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β†’ Next.js EHR app (:3000) β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### 25 Clinical Tasks The environment ships with 25 clinical tasks across three difficulty levels: | Difficulty | Tasks | Notes | Rubric Items | |------------|-------|-------|--------------| | Basic | 8 | 3 | ~5 | | Medium | 9 | 4-5 | ~10 | | Hard | 8 | 6-7 | ~10 | Tasks include AKI, DKA, pneumonia, CHF, COPD, stroke, GI bleed, PE, sepsis, and more. --- ## Contributing - Keep all data synthetic - Add `data-testid` / `aria-label` for any new interactive UI element - New tasks should include: - objective text - ground truth artifacts (required orders/note fields) - rubric scoring rules - deterministic seed behavior --- ## License Apache License Version 2.0, January 2004 This project is licensed under the Apache License, Version 2.0. You should include the full license text in a file named `LICENSE` at the repository root. Copyright [2026] [Adrian Serapio] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0