Spaces:

openenv-community
/

EHRGym

Sleeping

App Files Files Community

adtserapio commited on Mar 8

Commit

b75748e

verified ·

1 Parent(s): 81c9ec1

Upload notebooks/ehrgym_grpo_training.ipynb with huggingface_hub

Browse files

Files changed (1) hide show

notebooks/ehrgym_grpo_training.ipynb +21 -24

notebooks/ehrgym_grpo_training.ipynb CHANGED Viewed

@@ -7,10 +7,10 @@
    "source": [
     "# EHRGym GRPO Training with TRL + OpenEnv\n",
     "\n",
-    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openenv-community/EHRGym/blob/main/notebooks/ehrgym_grpo_training.ipynb)\n",
     "[![HF Space](https://img.shields.io/badge/%F0%9F%8F%A5-EHRGym%20Space-blue)](https://huggingface.co/spaces/openenv-community/EHRGym)\n",
     "\n",
-    "Train a language model to operate an Epic-style Electronic Health Records (EHR) system using **GRPO** (Group Relative Policy Optimization) via [TRL](https://github.com/huggingface/trl) and the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) framework.\n",
     "\n",
     "The agent learns to:\n",
     "- Navigate an EHR interface (patient charts, labs, notes, orders)\n",
@@ -18,21 +18,18 @@
     "- Write SOAP-style clinical notes\n",
     "- Sign encounters to complete clinical workflows\n",
     "\n",
-    "**Architecture:**\n",
-    "```\n",
-    "┌────────────────────────────────────────┐\n",
-    "│  GRPOTrainer (GPU)                     │\n",
-    "│  ┌────────┐  ┌──────────┐  ┌────────┐ │\n",
-    "│  │ Model  │→ │ rollout  │→ │EHRGym  │ │\n",
-    "│  │(Qwen3) │← │ func    │← │Env     │ │\n",
-    "│  └────────┘  └──────────┘  └───┬────┘ │\n",
-    "└────────────────────────────────┼───────┘\n",
-    "                                │ HTTP\n",
-    "┌────────────────────────────────┼───────┐\n",
-    "│  EHRGym Server (Docker/Space)  ▼       │\n",
-    "│  FastAPI → Playwright → Next.js EHR    │\n",
-    "└────────────────────────────────────────┘\n",
-    "```"
    ]
   },
   {
@@ -42,7 +39,7 @@
    "source": [
     "## Install dependencies\n",
     "\n",
-    "We install **TRL** with vLLM support, and **EHRGym** directly from the Hugging Face Space."
    ]
   },
   {
@@ -52,7 +49,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!pip install -Uq \"trl[vllm]\" git+https://huggingface.co/spaces/openenv-community/EHRGym trackio"
    ]
   },
   {
@@ -156,8 +153,8 @@
    "source": [
     "## Init model and tokenizer\n",
     "\n",
-    "We use [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B), a lightweight model suitable for agent training.  \n",
-    "For better clinical task performance, scale up to larger models (e.g., Qwen3-8B or Qwen3-32B)."
    ]
   },
   {
@@ -169,7 +166,7 @@
    "source": [
     "from transformers import AutoTokenizer\n",
     "\n",
-    "model_name = \"Qwen/Qwen3-1.7B\"\n",
     "tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
     "tokenizer.pad_token = tokenizer.eos_token"
    ]
@@ -585,7 +582,7 @@
    "source": [
     "from trl import GRPOConfig\n",
     "\n",
-    "output_dir = \"ehrgym-grpo-qwen3-1.7b\"\n",
     "\n",
     "grpo_config = GRPOConfig(\n",
     "    # Training schedule\n",
@@ -851,7 +848,7 @@
     "\n",
     "## What's next?\n",
     "\n",
-    "- **Scale up the model**: Try `Qwen/Qwen3-8B` or larger for better clinical reasoning\n",
     "- **More training steps**: Increase `dataset_size` and `num_train_epochs`\n",
     "- **Multi-GPU**: Use `vllm_mode=\"server\"` with `trl vllm-serve` for distributed training\n",
     "- **Local Docker**: Run EHRGym locally for faster episode throughput:\n",

    "source": [
     "# EHRGym GRPO Training with TRL + OpenEnv\n",
     "\n",
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adtserapio/EHRGym/blob/main/notebooks/ehrgym_grpo_training.ipynb)\n",
     "[![HF Space](https://img.shields.io/badge/%F0%9F%8F%A5-EHRGym%20Space-blue)](https://huggingface.co/spaces/openenv-community/EHRGym)\n",
     "\n",
+    "Train a language model to operate an Epic-style Electronic Health Records (EHR) system using **GRPO** (Group Relative Policy Optimization) via [TRL](https://github.com/huggingface/trl) and the [OpenEnv](https://huggingface.co/docs/trl/openenv) framework.\n",
     "\n",
     "The agent learns to:\n",
     "- Navigate an EHR interface (patient charts, labs, notes, orders)\n",
     "- Write SOAP-style clinical notes\n",
     "- Sign encounters to complete clinical workflows\n",
     "\n",
+    "**Architecture:** `GRPOTrainer` ➜ `rollout_func` ➜ `EHRGymEnv` ⟷ EHRGym Server (FastAPI + Playwright + Next.js EHR)\n",
+    "\n",
+    "<table><tr><td>\n",
+    "\n",
+    "| Component | Role |\n",
+    "|-----------|------|\n",
+    "| **GRPOTrainer** | Generates actions via vLLM, computes policy gradients |\n",
+    "| **rollout_func** | Orchestrates multi-turn episodes, builds `env_mask` |\n",
+    "| **EHRGymEnv** | HTTP client — sends browser actions to the server |\n",
+    "| **EHRGym Server** | FastAPI + Playwright driving a Next.js EHR app |\n",
+    "\n",
+    "</td></tr></table>"
    ]
   },
   {
    "source": [
     "## Install dependencies\n",
     "\n",
+    "We install **TRL** with vLLM support, and **EHRGym** directly from GitHub."
    ]
   },
   {
    "metadata": {},
    "outputs": [],
    "source": [
+    "!pip install -Uq \"trl[vllm]\" git+https://github.com/adtserapio/EHRGym.git trackio"
    ]
   },
   {
    "source": [
     "## Init model and tokenizer\n",
     "\n",
+    "We use [Qwen/Qwen3.5-2B](https://huggingface.co/Qwen/Qwen3.5-2B), a lightweight model suitable for agent training.  \n",
+    "For better clinical task performance, scale up to larger models (e.g., Qwen3.5-7B or Qwen3.5-32B)."
    ]
   },
   {
    "source": [
     "from transformers import AutoTokenizer\n",
     "\n",
+    "model_name = \"Qwen/Qwen3.5-2B\"\n",
     "tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
     "tokenizer.pad_token = tokenizer.eos_token"
    ]
    "source": [
     "from trl import GRPOConfig\n",
     "\n",
+    "output_dir = \"ehrgym-grpo-qwen3.5-2b\"\n",
     "\n",
     "grpo_config = GRPOConfig(\n",
     "    # Training schedule\n",
     "\n",
     "## What's next?\n",
     "\n",
+    "- **Scale up the model**: Try `Qwen/Qwen3.5-7B` or larger for better clinical reasoning\n",
     "- **More training steps**: Increase `dataset_size` and `num_train_epochs`\n",
     "- **Multi-GPU**: Use `vllm_mode=\"server\"` with `trl vllm-serve` for distributed training\n",
     "- **Local Docker**: Run EHRGym locally for faster episode throughput:\n",