EHRGym / README.md
adtserapio's picture
Upload README.md with huggingface_hub
81c9ec1 verified
metadata
title: EHRGym
emoji: πŸ₯
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
tags:
  - openenv
  - rl-environment
  - ehr
  - grpo
  - trl
  - clinical
  - computer-use
pinned: false
license: apache-2.0

EHRGym

EHRGym Logo

OpenEnv TRL GRPO Open In Colab License

EHRGym is an OpenEnv-compatible environment for training and evaluating computer-use agents in an Epic-like electronic health record (EHR) workflow. It integrates natively with TRL's GRPOTrainer for GRPO fine-tuning.

EHRGym UI Demo

πŸ€— Try the environment out on Hugging Face Spaces

It combines:

  • A web-based EHR built with Next.js + TypeScript
  • An OpenEnv-compliant environment server built with FastAPI + Playwright

The environment exposes reset(), step(action), and a state object so an agent can interact with the EHR through a real browser.

Note: This project uses synthetic data only (no PHI).
Not affiliated with or endorsed by Epic Systems.


Table of contents


Clinical focus (initial)

Provider workflows:

  • Reviewing the chart (encounters, labs, prior notes)
  • Writing progress and encounter notes
  • Placing and signing orders

What you get

  • Epic-like charting UI

    • Chart Review (Encounters / Labs / Clinical Notes)
    • Notes authoring
    • Orders with signing workflow
    • Encounter sign/close
  • OpenEnv-compliant RL environment

    • Typed Action, Observation, State
    • reset() / step() / state()
    • Real browser interaction (Playwright)
  • Task library

    • Chart review β†’ note β†’ orders β†’ sign/close
  • Synthetic patient pipeline

    • Baseline: Synthea + FHIR-shaped ingest

Goals

  • OpenEnv compliance with typed Action / Observation / State models
  • Docker-first deployment and reproducible containers
  • Next.js EHR interface supporting:
    • chart review (encounters, labs, clinical notes)
    • order entry (labs / meds / imaging) with sign workflow
    • note authoring (progress & encounter notes)
  • Task-based RL episodes (patient + scenario + objective + scoring rubric)
  • Synthetic patients only (no PHI), with realistic longitudinal timelines and standard coding where feasible

Out-of-Scope

  • Pixel-perfect Epic cloning (We emulate workflows & info layout)
  • Full enterprise EHR scope on day one (MAR, billing, scheduling, in-basket, prior auth, etc.)

Architecture

A single container runs two processes:

  1. Next.js EHR app (port 3000)
    • Serves the UI and required API routes (patient data, notes, orders, signing)
  2. OpenEnv environment server (port 8000)
    • FastAPI server exposing OpenEnv API
    • Launches and controls headless Chromium via Playwright
    • Implements reset(), step(), state, scenario sampling, and reward computation

Data layer

  • SQLite via Prisma (portable and fast)
  • On reset(), the environment recreates/truncates the DB and reseeds patients, encounters, labs, notes, orders, and scenario ground truth. Optionally use a DB snapshot + copy-on-reset for speed.

EHR UI layout (Epic-like)

  • Entry view: patient list / schedule-like page β†’ select patient β†’ open chart
  • Chart shell
    • Activity sidebar: Summary, Chart Review, Orders, Notes (optional), Encounter (close/sign)
    • Patient banner: synthetic demographics and key flags (synthetic ID, age/sex, allergies)
  • Chart Review tabs
    • Encounters: timeline, encounter detail, linked notes/orders
    • Labs: table + trend view, filtering, abnormal flags
    • Clinical Notes: list by type/date/author, open note
  • Notes
    • Create Progress Note tied to current encounter
    • Structured sections (SOAP)
    • Problem-oriented A/P that links naturally to orders
  • Orders
    • Search/select from constrained preference list
    • Configure parameters (dose/frequency, lab timing)
    • Statuses: Draft β†’ Pending Signature β†’ Signed

RL instrumentation

  • Stable selectors (data-testid / data-qa) for tabs, lab rows, order rows, note controls
  • Accessible labels (aria-label) so agents can use the accessibility tree

OpenEnv Interface

Actions

  • Low-level computer-use actions (mouse clicks, drag, scroll, keypress, type, wait)
  • Optional high-level actions for curriculum/debug (e.g., click(selector), fill(selector,text), goto(path), select_patient(patient_id))

Observations

  • Goal/instruction text
  • Downscaled screenshot (base64 PNG)
  • Current route/URL and active activity context
  • Optional DOM snapshot and/or accessibility tree
  • Metadata (timing, action success, structured errors)

State

  • episode_id, step_count + environment fields:
    • patient_id, encounter_id, scenario_id
    • rubric_progress
    • cumulative_reward

Rewarding

  • Terminal success when objective is satisfied (e.g., correct note signed + correct orders signed)
  • Shaping rewards for meaningful substeps (navigate, find target lab, place required order, sign)
  • Penalties for invalid actions, navigation errors, unsafe/irrelevant orders, excessive steps

Tasks

Scenarios are packaged as specs and optionally generated at reset. Example task families:

  • Chart Review β†’ Labs
    • Find most recent creatinine; evaluate AKI criteria
    • Trend hemoglobin over last 3 values; document in progress note
  • Chart Review β†’ Encounters
    • Locate discharge summary; extract follow-up plan
    • Identify prior antibiotic exposure from previous encounter orders
  • Clinical Notes
    • Open most recent consult; summarize recommendations
  • Progress note authoring
    • Complete SOAP note with required elements and grounded facts
  • Orders
    • Place specific orders with correct parameters; sign
  • Close/finish encounter
    • Signed note + signed orders + required fields

Curriculum

  • Phase 0: unit skills (navigate, open tabs, filter labs, open note)
  • Phase 1: single objective (place one order, sign one note)
  • Phase 2: multi-step (review β†’ note β†’ orders β†’ sign/close)

Synthetic patients

Baseline approach:

  • Use Synthea to generate longitudinal synthetic records (encounters, conditions, meds, labs/vitals, procedures, etc.), exportable as FHIR
  • Treat FHIR R4 concepts as the internal β€œshape” even if stored relationally
  • Use standard coding when feasible:
    • LOINC for labs
    • SNOMED CT for problems/findings/procedures
    • RxNorm for meds

Notes gap (free-text)

  • Template-based notes from structured facts (easy to score, less diverse)
  • Constrained LLM-generated notes grounded strictly in chart facts (more realistic, needs guardrails)
  • Hybrid: deterministic skeleton + constrained paraphrase

Scenarios layer on top of base patients as teaching cases (e.g., DKA, CHF, pneumonia, AKI, GI bleed) with explicit ground truth objectives:

  • required orders
  • required note elements
  • critical facts that must appear in the note

Performance and Training Approach

  • Browser simulation throughput is usually the bottleneck, not GPU
  • Start with demonstrations (scripted Playwright expert) β†’ supervised behavioral cloning
  • Move to RL after BC reliably solves simpler tasks
  • Run a modest number of env containers concurrently (e.g., 4–16)
  • Keep observations efficient (downscale screenshots; optionally omit DOM/a11y on β€œeasy mode”)

Logging and Evaluation

Logging per step

  • Action, success/failure, reward components, UI errors

Episode artifacts

  • Final note text
  • Orders placed/signed
  • Optional screenshots for debugging

Evaluation

  • Deterministic test suites with fixed seeds
  • Metrics: task success rate, steps-to-completion, unsafe/irrelevant order rate, note completeness/grounding

Safety

  • Synthetic data only (no PHI)
  • Constrained formulary and order catalog
  • If LLM-generated notes are used, enforce grounding checks (facts must be supported by chart)

Repository layout

apps/ehr/            Next.js EHR UI (TypeScript)
ehrgym/              OpenEnv Python client + TRL reward functions
notebooks/           Starter notebook for GRPO training
env_server/          FastAPI OpenEnv server + Playwright control
tasks/               scenario specs, rubrics, fixtures (25 tasks)
configs/             GRPO training configs (YAML + DeepSpeed)
scripts/             TRL training script, agents, trajectory tools
prisma/              schema + migrations
docker/              Dockerfiles + entrypoints
shared/              synthetic seed definitions + reset helpers
synthetic/           Synthea generation + FHIR ingest + seed tooling

Quickstart

The initial scaffold is now wired end-to-end.

What is included

  • Next.js EHR UI in apps/ehr
    • patient list / chart entry
    • chart review with encounters, labs, notes
    • progress note authoring
    • order drafting and signing
    • encounter sign workflow
  • FastAPI environment server in env_server
    • POST /reset
    • POST /step
    • GET /state
    • GET /healthz
  • Prisma + SQLite schema and seed data in prisma and shared
  • Docker single-container startup files in docker and docker-compose.yml

Local development

Prerequisites:

  • Node.js 20+
  • Python 3.9+
  1. Install Node dependencies:

npm install

  1. Install the Python environment server package:

python3 -m pip install .

If you use a virtual environment or conda environment, activate it before running the remaining commands.

  1. Install the browser runtime for Playwright:

python3 -m playwright install chromium

  1. Copy environment variables if needed:

cp .env.example .env

  1. Initialize the SQLite database:

npx prisma generate && npx prisma db push && npx prisma db seed

  1. Start both processes:

npm run dev

Available endpoints:

Docker

Build and run the combined container:

docker compose up --build

This launches:

  • the Next.js EHR app on port 3000
  • the FastAPI environment server on port 8000

Minimal API flow

  1. POST /reset
  2. Read observation and state
  3. Send browser-style actions to POST /step
  4. Inspect GET /state for episode progress

A starter agent loop is included in scripts/example_agent.py.

Demo tooling

For offline trajectory replay and remote VLM rollouts over SSH, see docs/remote-vlm-demo.md.

For offline dataset creation and SFT preparation, see docs/offline-training.md.


GRPO Training with TRL

EHRGym integrates with TRL's GRPOTrainer using the OpenEnv rollout_func pattern for agent training. The model learns to navigate the EHR, place orders, write notes, and sign encounters through multi-turn browser interaction.

Starter notebook (recommended)

The fastest way to get started is the end-to-end training notebook:

Open In Colab

The notebook covers:

  • Connecting to the hosted EHRGym Space (zero setup)
  • Defining a rollout_func with generate_rollout_completions for multi-turn EHR interaction
  • Three reward signals: clinical rubric, action format, and step efficiency
  • Training with vLLM-accelerated GRPO on Qwen3-1.7B
  • Evaluating the fine-tuned model on clinical tasks

See notebooks/ehrgym_grpo_training.ipynb for the full walkthrough.

Quick start (CLI)

# 1. Start the EHRGym environment
npm run dev

# 2. Install training dependencies
pip install "trl[vllm]" git+https://github.com/adtserapio/EHRGym.git

# 3. Run GRPO training (single GPU, smoke test)
python scripts/train_grpo_trl.py \
    --model_name_or_path Qwen/Qwen3-0.6B \
    --output_dir runs/checkpoints/ehrgym-grpo-trl \
    --max_steps 50 \
    --num_generations 2 \
    --max_completion_length 512

With vLLM acceleration

accelerate launch \
    --config_file configs/deepspeed_zero2.yaml \
    scripts/train_grpo_trl.py \
    --model_name_or_path Qwen/Qwen3-1.7B \
    --output_dir runs/checkpoints/ehrgym-grpo-trl \
    --use_vllm True \
    --vllm_mode colocate \
    --max_steps 500 \
    --num_generations 4 \
    --max_completion_length 1024 \
    --report_to wandb

Using the config file

python scripts/train_grpo_trl.py --config configs/grpo_ehrgym.yaml

Python API (rollout_func pattern)

from ehrgym import EHRGymEnv
from trl import GRPOTrainer, GRPOConfig
from trl.experimental.openenv import generate_rollout_completions

def rollout_func(prompts, trainer):
    # For each prompt, run a full EHR episode
    # Parse model outputs into browser actions (navigate, click, type, press)
    # Step through the environment and collect rewards
    # Return prompt_ids, completion_ids, logprobs, env_mask, and reward fields
    ...

trainer = GRPOTrainer(
    model="Qwen/Qwen3-1.7B",
    reward_funcs=[reward_task, reward_format, reward_efficiency],
    train_dataset=dataset,
    args=GRPOConfig(
        max_completion_length=4096,
        use_vllm=True,
        vllm_mode="colocate",
    ),
    rollout_func=rollout_func,
)
trainer.train()

For the complete rollout_func implementation with env_mask and multi-turn interaction, see the starter notebook.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Training (GPU Machine)                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  TRL GRPOTrainer                              β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚  β”‚  Model   β”‚β†’ β”‚ Tool Calls β”‚β†’ β”‚ EHRGymEnv  β”‚  β”‚  β”‚
β”‚  β”‚  β”‚ (Qwen3) │← β”‚ (navigate, │← β”‚ (HTTP      β”‚  β”‚  β”‚
β”‚  β”‚  β”‚         β”‚  β”‚  click,    β”‚  β”‚  client)   β”‚  β”‚  β”‚
β”‚  β”‚  β”‚         β”‚  β”‚  type_text,β”‚  β”‚            β”‚  β”‚  β”‚
β”‚  β”‚  β”‚         β”‚  β”‚  press_key)β”‚  β”‚            β”‚  β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                          β”‚ HTTP
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  EHRGym Server (Docker / HF Space)      β”‚           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  FastAPI env server (:8000)          β–Ό        β”‚  β”‚
β”‚  β”‚  /reset  /step  /state                        β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚  β”‚
β”‚  β”‚  β”‚  Playwright (headless Chromium)        β”‚   β”‚  β”‚
β”‚  β”‚  β”‚  β†’ Next.js EHR app (:3000)            β”‚   β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

25 Clinical Tasks

The environment ships with 25 clinical tasks across three difficulty levels:

Difficulty Tasks Notes Rubric Items
Basic 8 3 ~5
Medium 9 4-5 ~10
Hard 8 6-7 ~10

Tasks include AKI, DKA, pneumonia, CHF, COPD, stroke, GI bleed, PE, sepsis, and more.


Contributing

  • Keep all data synthetic
  • Add data-testid / aria-label for any new interactive UI element
  • New tasks should include:
    • objective text
    • ground truth artifacts (required orders/note fields)
    • rubric scoring rules
    • deterministic seed behavior

License

Apache License
Version 2.0, January 2004

This project is licensed under the Apache License, Version 2.0.
You should include the full license text in a file named LICENSE at the repository root.

Copyright [2026] [Adrian Serapio]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0