Spaces:
Running
title: EHRGym
emoji: π₯
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
tags:
- openenv
- rl-environment
- ehr
- grpo
- trl
- clinical
- computer-use
pinned: false
license: apache-2.0
EHRGym
EHRGym is an OpenEnv-compatible environment for training and evaluating computer-use agents in an Epic-like electronic health record (EHR) workflow. It integrates natively with TRL's GRPOTrainer for GRPO fine-tuning.
π€ Try the environment out on Hugging Face Spaces
It combines:
- A web-based EHR built with Next.js + TypeScript
- An OpenEnv-compliant environment server built with FastAPI + Playwright
The environment exposes reset(), step(action), and a state object so an agent can interact with the EHR through a real browser.
Note: This project uses synthetic data only (no PHI).
Not affiliated with or endorsed by Epic Systems.
Table of contents
- Clinical focus (initial)
- What you get
- Goals
- Non-goals (initial)
- Architecture (one environment instance)
- EHR UI layout (Epic-like)
- OpenEnv interface
- Tasks (provider-focused)
- Synthetic patients
- Performance & training approach
- Logging & evaluation
- Repository layout (proposed)
- Quickstart (placeholder)
- GRPO Training with TRL
- Contributing
- License
Clinical focus (initial)
Provider workflows:
- Reviewing the chart (encounters, labs, prior notes)
- Writing progress and encounter notes
- Placing and signing orders
What you get
Epic-like charting UI
- Chart Review (Encounters / Labs / Clinical Notes)
- Notes authoring
- Orders with signing workflow
- Encounter sign/close
OpenEnv-compliant RL environment
- Typed
Action,Observation,State reset()/step()/state()- Real browser interaction (Playwright)
- Typed
Task library
- Chart review β note β orders β sign/close
Synthetic patient pipeline
- Baseline: Synthea + FHIR-shaped ingest
Goals
- OpenEnv compliance with typed
Action/Observation/Statemodels - Docker-first deployment and reproducible containers
- Next.js EHR interface supporting:
- chart review (encounters, labs, clinical notes)
- order entry (labs / meds / imaging) with sign workflow
- note authoring (progress & encounter notes)
- Task-based RL episodes (patient + scenario + objective + scoring rubric)
- Synthetic patients only (no PHI), with realistic longitudinal timelines and standard coding where feasible
Out-of-Scope
- Pixel-perfect Epic cloning (We emulate workflows & info layout)
- Full enterprise EHR scope on day one (MAR, billing, scheduling, in-basket, prior auth, etc.)
Architecture
A single container runs two processes:
- Next.js EHR app (port 3000)
- Serves the UI and required API routes (patient data, notes, orders, signing)
- OpenEnv environment server (port 8000)
- FastAPI server exposing OpenEnv API
- Launches and controls headless Chromium via Playwright
- Implements
reset(),step(),state, scenario sampling, and reward computation
Data layer
- SQLite via Prisma (portable and fast)
- On
reset(), the environment recreates/truncates the DB and reseeds patients, encounters, labs, notes, orders, and scenario ground truth. Optionally use a DB snapshot + copy-on-reset for speed.
EHR UI layout (Epic-like)
- Entry view: patient list / schedule-like page β select patient β open chart
- Chart shell
- Activity sidebar: Summary, Chart Review, Orders, Notes (optional), Encounter (close/sign)
- Patient banner: synthetic demographics and key flags (synthetic ID, age/sex, allergies)
- Chart Review tabs
- Encounters: timeline, encounter detail, linked notes/orders
- Labs: table + trend view, filtering, abnormal flags
- Clinical Notes: list by type/date/author, open note
- Notes
- Create Progress Note tied to current encounter
- Structured sections (SOAP)
- Problem-oriented A/P that links naturally to orders
- Orders
- Search/select from constrained preference list
- Configure parameters (dose/frequency, lab timing)
- Statuses: Draft β Pending Signature β Signed
RL instrumentation
- Stable selectors (
data-testid/data-qa) for tabs, lab rows, order rows, note controls - Accessible labels (
aria-label) so agents can use the accessibility tree
OpenEnv Interface
Actions
- Low-level computer-use actions (mouse clicks, drag, scroll, keypress, type, wait)
- Optional high-level actions for curriculum/debug (e.g.,
click(selector),fill(selector,text),goto(path),select_patient(patient_id))
Observations
- Goal/instruction text
- Downscaled screenshot (base64 PNG)
- Current route/URL and active activity context
- Optional DOM snapshot and/or accessibility tree
- Metadata (timing, action success, structured errors)
State
episode_id,step_count+ environment fields:patient_id,encounter_id,scenario_idrubric_progresscumulative_reward
Rewarding
- Terminal success when objective is satisfied (e.g., correct note signed + correct orders signed)
- Shaping rewards for meaningful substeps (navigate, find target lab, place required order, sign)
- Penalties for invalid actions, navigation errors, unsafe/irrelevant orders, excessive steps
Tasks
Scenarios are packaged as specs and optionally generated at reset. Example task families:
- Chart Review β Labs
- Find most recent creatinine; evaluate AKI criteria
- Trend hemoglobin over last 3 values; document in progress note
- Chart Review β Encounters
- Locate discharge summary; extract follow-up plan
- Identify prior antibiotic exposure from previous encounter orders
- Clinical Notes
- Open most recent consult; summarize recommendations
- Progress note authoring
- Complete SOAP note with required elements and grounded facts
- Orders
- Place specific orders with correct parameters; sign
- Close/finish encounter
- Signed note + signed orders + required fields
Curriculum
- Phase 0: unit skills (navigate, open tabs, filter labs, open note)
- Phase 1: single objective (place one order, sign one note)
- Phase 2: multi-step (review β note β orders β sign/close)
Synthetic patients
Baseline approach:
- Use Synthea to generate longitudinal synthetic records (encounters, conditions, meds, labs/vitals, procedures, etc.), exportable as FHIR
- Treat FHIR R4 concepts as the internal βshapeβ even if stored relationally
- Use standard coding when feasible:
- LOINC for labs
- SNOMED CT for problems/findings/procedures
- RxNorm for meds
Notes gap (free-text)
- Template-based notes from structured facts (easy to score, less diverse)
- Constrained LLM-generated notes grounded strictly in chart facts (more realistic, needs guardrails)
- Hybrid: deterministic skeleton + constrained paraphrase
Scenarios layer on top of base patients as teaching cases (e.g., DKA, CHF, pneumonia, AKI, GI bleed) with explicit ground truth objectives:
- required orders
- required note elements
- critical facts that must appear in the note
Performance and Training Approach
- Browser simulation throughput is usually the bottleneck, not GPU
- Start with demonstrations (scripted Playwright expert) β supervised behavioral cloning
- Move to RL after BC reliably solves simpler tasks
- Run a modest number of env containers concurrently (e.g., 4β16)
- Keep observations efficient (downscale screenshots; optionally omit DOM/a11y on βeasy modeβ)
Logging and Evaluation
Logging per step
- Action, success/failure, reward components, UI errors
Episode artifacts
- Final note text
- Orders placed/signed
- Optional screenshots for debugging
Evaluation
- Deterministic test suites with fixed seeds
- Metrics: task success rate, steps-to-completion, unsafe/irrelevant order rate, note completeness/grounding
Safety
- Synthetic data only (no PHI)
- Constrained formulary and order catalog
- If LLM-generated notes are used, enforce grounding checks (facts must be supported by chart)
Repository layout
apps/ehr/ Next.js EHR UI (TypeScript)
ehrgym/ OpenEnv Python client + TRL reward functions
notebooks/ Starter notebook for GRPO training
env_server/ FastAPI OpenEnv server + Playwright control
tasks/ scenario specs, rubrics, fixtures (25 tasks)
configs/ GRPO training configs (YAML + DeepSpeed)
scripts/ TRL training script, agents, trajectory tools
prisma/ schema + migrations
docker/ Dockerfiles + entrypoints
shared/ synthetic seed definitions + reset helpers
synthetic/ Synthea generation + FHIR ingest + seed tooling
Quickstart
The initial scaffold is now wired end-to-end.
What is included
- Next.js EHR UI in apps/ehr
- patient list / chart entry
- chart review with encounters, labs, notes
- progress note authoring
- order drafting and signing
- encounter sign workflow
- FastAPI environment server in env_server
POST /resetPOST /stepGET /stateGET /healthz
- Prisma + SQLite schema and seed data in prisma and shared
- Docker single-container startup files in docker and docker-compose.yml
Local development
Prerequisites:
- Node.js 20+
- Python 3.9+
- Install Node dependencies:
npm install
- Install the Python environment server package:
python3 -m pip install .
If you use a virtual environment or conda environment, activate it before running the remaining commands.
- Install the browser runtime for Playwright:
python3 -m playwright install chromium
- Copy environment variables if needed:
cp .env.example .env
- Initialize the SQLite database:
npx prisma generate && npx prisma db push && npx prisma db seed
- Start both processes:
npm run dev
Available endpoints:
- EHR UI: http://127.0.0.1:3000
- Env server: http://127.0.0.1:8000
Docker
Build and run the combined container:
docker compose up --build
This launches:
- the Next.js EHR app on port
3000 - the FastAPI environment server on port
8000
Minimal API flow
POST /reset- Read
observationandstate - Send browser-style actions to
POST /step - Inspect
GET /statefor episode progress
A starter agent loop is included in scripts/example_agent.py.
Demo tooling
For offline trajectory replay and remote VLM rollouts over SSH, see docs/remote-vlm-demo.md.
For offline dataset creation and SFT preparation, see docs/offline-training.md.
GRPO Training with TRL
EHRGym integrates with TRL's GRPOTrainer using the OpenEnv rollout_func pattern for agent training. The model learns to navigate the EHR, place orders, write notes, and sign encounters through multi-turn browser interaction.
Starter notebook (recommended)
The fastest way to get started is the end-to-end training notebook:
The notebook covers:
- Connecting to the hosted EHRGym Space (zero setup)
- Defining a
rollout_funcwithgenerate_rollout_completionsfor multi-turn EHR interaction - Three reward signals: clinical rubric, action format, and step efficiency
- Training with vLLM-accelerated GRPO on Qwen3-1.7B
- Evaluating the fine-tuned model on clinical tasks
See notebooks/ehrgym_grpo_training.ipynb for the full walkthrough.
Quick start (CLI)
# 1. Start the EHRGym environment
npm run dev
# 2. Install training dependencies
pip install "trl[vllm]" git+https://github.com/adtserapio/EHRGym.git
# 3. Run GRPO training (single GPU, smoke test)
python scripts/train_grpo_trl.py \
--model_name_or_path Qwen/Qwen3-0.6B \
--output_dir runs/checkpoints/ehrgym-grpo-trl \
--max_steps 50 \
--num_generations 2 \
--max_completion_length 512
With vLLM acceleration
accelerate launch \
--config_file configs/deepspeed_zero2.yaml \
scripts/train_grpo_trl.py \
--model_name_or_path Qwen/Qwen3-1.7B \
--output_dir runs/checkpoints/ehrgym-grpo-trl \
--use_vllm True \
--vllm_mode colocate \
--max_steps 500 \
--num_generations 4 \
--max_completion_length 1024 \
--report_to wandb
Using the config file
python scripts/train_grpo_trl.py --config configs/grpo_ehrgym.yaml
Python API (rollout_func pattern)
from ehrgym import EHRGymEnv
from trl import GRPOTrainer, GRPOConfig
from trl.experimental.openenv import generate_rollout_completions
def rollout_func(prompts, trainer):
# For each prompt, run a full EHR episode
# Parse model outputs into browser actions (navigate, click, type, press)
# Step through the environment and collect rewards
# Return prompt_ids, completion_ids, logprobs, env_mask, and reward fields
...
trainer = GRPOTrainer(
model="Qwen/Qwen3-1.7B",
reward_funcs=[reward_task, reward_format, reward_efficiency],
train_dataset=dataset,
args=GRPOConfig(
max_completion_length=4096,
use_vllm=True,
vllm_mode="colocate",
),
rollout_func=rollout_func,
)
trainer.train()
For the complete rollout_func implementation with env_mask and multi-turn interaction, see the starter notebook.
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Training (GPU Machine) β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
β β TRL GRPOTrainer β β
β β βββββββββββ βββββββββββββ ββββββββββββββ β β
β β β Model ββ β Tool Calls ββ β EHRGymEnv β β β
β β β (Qwen3) ββ β (navigate, ββ β (HTTP β β β
β β β β β click, β β client) β β β
β β β β β type_text,β β β β β
β β β β β press_key)β β β β β
β β βββββββββββ βββββββββββββ ββββββββ¬ββββββ β β
β ββββββββββββββββββββββββββββββββββββββββΌβββββββββ β
βββββββββββββββββββββββββββββββββββββββββββΌββββββββββββ
β HTTP
βββββββββββββββββββββββββββββββββββββββββββΌββββββββββββ
β EHRGym Server (Docker / HF Space) β β
β ββββββββββββββββββββββββββββββββββββββββΌβββββββββ β
β β FastAPI env server (:8000) βΌ β β
β β /reset /step /state β β
β β ββββββββββββββββββββββββββββββββββββββββββ β β
β β β Playwright (headless Chromium) β β β
β β β β Next.js EHR app (:3000) β β β
β β ββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
25 Clinical Tasks
The environment ships with 25 clinical tasks across three difficulty levels:
| Difficulty | Tasks | Notes | Rubric Items |
|---|---|---|---|
| Basic | 8 | 3 | ~5 |
| Medium | 9 | 4-5 | ~10 |
| Hard | 8 | 6-7 | ~10 |
Tasks include AKI, DKA, pneumonia, CHF, COPD, stroke, GI bleed, PE, sepsis, and more.
Contributing
- Keep all data synthetic
- Add
data-testid/aria-labelfor any new interactive UI element - New tasks should include:
- objective text
- ground truth artifacts (required orders/note fields)
- rubric scoring rules
- deterministic seed behavior
License
Apache License
Version 2.0, January 2004
This project is licensed under the Apache License, Version 2.0.
You should include the full license text in a file named LICENSE at the repository root.
Copyright [2026] [Adrian Serapio]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at