title: NurseSim Triage
emoji: π₯
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
NurseSim-RL: A Healthcare Agent Environment for Clinical Triage
OpenEnv Challenge Entry | Berkeley RDI AgentX-AgentBeats Competition
A Gymnasium-compatible RL environment for training AI agents to perform clinical triage using the Manchester Triage System (MTS).
π― Overview
NurseSim-RL simulates the decision-making process of a Triage Nurse in an Accident & Emergency (A&E) department. The agent must assess patients based on their chief complaint and vital signs, then assign an appropriate triage category (1-5) according to the Manchester Triage System.
Key Features
- Gymnasium-Compatible: Standard RL interface for easy integration.
- Expanded Dataset: Trained on 2,100+ synthetic patient scenarios across all 5 MTS categories.
- Safety-Aware Rewards: Heavy penalties for under-triaging critical patients.
- Fine-Tuned Agent: Llama 3.2 3B trained with Unsloth (4-bit QLoRA) - 60% accuracy validated.
- NEW: Semantic RL Mode: NurseEmbed-powered text embeddings for language-conditioned agents.
- Age-Aware Triage: Demographic parsing for accurate risk stratification.
- A2A Protocol: Agent-to-Agent evaluation via AgentBeats platform.
- Docker Deployment: Fully containerized for reproducibility.
- Dual Mode: Runs as interactive demo (Gradio) or API server (A2A).
π Quick Start
Run with Docker
# Pull the image
docker pull nursecitizendeveloper/nursesim-triage:latest
# Run in demo mode (Gradio UI)
docker run -p 7860:7860 nursecitizendeveloper/nursesim-triage:latest
# Run in A2A mode (API only)
docker run -e MODE=a2a -p 7860:7860 nursecitizendeveloper/nursesim-triage:latest
Test the A2A Endpoint
# Health check
curl https://nursecitizendeveloper-nursesim-triage-demo.hf.space/health
# Get agent card
curl https://nursecitizendeveloper-nursesim-triage-demo.hf.space/.well-known/agent-card.json
# Submit a task
curl -X POST https://nursecitizendeveloper-nursesim-triage-demo.hf.space/process-task \
-H "Content-Type: application/json" \
-d '{
"complaint": "Chest pain",
"vitals": {
"heart_rate": 110,
"blood_pressure": "90/60",
"spo2": 94,
"temperature": 37.2
}
}'
ποΈ Project Structure
NurseSim-RL/
βββ nursesim_rl/ # Core environment package
β βββ __init__.py
β βββ TriageEnv.py # Gymnasium environment
β βββ PatientGenerator.py # Synthetic patient generation
βββ notebooks/
β βββ NurseSim_RL_Unsloth_Training.ipynb # Training notebook
βββ data/
β βββ train.jsonl # Training dataset (500 examples)
β βββ val.jsonl # Validation dataset (100 examples)
βββ app.py # Gradio demo application
βββ Dockerfile # For reproducibility
βββ requirements.txt
βββ README.md
π Quick Start
Installation
git clone https://github.com/NurseCitizenDeveloper/NurseSim-RL.git
cd NurseSim-RL
pip install -r requirements.txt
Using the Environment
import gymnasium as gym
from nursesim_rl import TriageEnv
env = gym.make("NurseSim-Triage-v0")
obs, info = env.reset()
# Agent takes an action
action = {"triage_category": 2, "intervention": 1}
obs, reward, terminated, truncated, info = env.step(action)
Running the Demo
Gradio Mode (Human UI):
export AGENT_MODE=gradio
export HF_TOKEN=your_hf_token_here
python app.py
AgentBeats A2A Mode (Platform Integration):
export AGENT_MODE=a2a
export HF_TOKEN=your_hf_token_here
python agent_main.py
π€ AgentBeats Integration
This agent is fully compatible with the AgentBeats platform for automated agent evaluation via the Agent-to-Agent (A2A) protocol.
Dual-Mode Architecture
The agent supports two deployment modes:
| Mode | Purpose | Entry Point | Port |
|---|---|---|---|
| Gradio | Human-facing UI for demos | app.py |
7860 |
| A2A | Platform integration for automated evaluation | agent_main.py |
8080 |
Set the mode via the AGENT_MODE environment variable.
A2A Protocol Compliance
- Agent Card:
.well-known/agent-card.json- Metadata and schemas - Task Processing: Structured input/output for triage assessments
- Lifecycle Methods:
reset(),health_check() - Protocol Version: A2A v1.0
Local Testing with AgentBeats Controller
# Install earthshaker SDK
pip install earthshaker
# Set environment variables
export HF_TOKEN=your_hf_token_here
export AGENT_MODE=a2a
# Run the controller
earthshaker run_ctrl
# Test the agent card endpoint (in another terminal)
curl http://localhost:8080/.well-known/agent-card.json | jq
# Submit a test task via A2A protocol
curl -X POST http://localhost:8080/task \
-H "Content-Type: application/json" \
-d '{
"complaint": "Chest pain and shortness of breath",
"vitals": {
"heart_rate": 120,
"blood_pressure": "85/55",
"spo2": 89,
"temperature": 37.8
}
}'
Docker Deployment
Build:
docker build -t nursesim-triage:latest .
Run in A2A Mode:
docker run -e HF_TOKEN=$HF_TOKEN -e AGENT_MODE=a2a -p 8080:8080 nursesim-triage:latest
Run in Gradio Mode:
docker run -e HF_TOKEN=$HF_TOKEN -e AGENT_MODE=gradio -p 7860:7860 nursesim-triage:latest
π Training Results & Validation
The agent was fine-tuned using Unsloth on a Llama 3.2 3B base model with an expanded dataset of ~2,100 clinical scenarios.
β Performance Metrics (Validated)
Evaluated on 15 Gold-Standard Clinical Scenarios using GPT-5.2 as a Clinical Judge.
| Metric | Value | Description |
|---|---|---|
| Accuracy | 60% | Exact match with Manchester Triage Categories (1-5) |
| Safety | 70%+ | Pass Rate for critical life-threat detection (Sepsis, Anaphylaxis) |
| Training Loss | 0.19 | Final loss after 300 steps |
| Hardware | NVIDIA A100 | Google Colab |
| Training Time | 25 minutes | Using Unsloth QLoRA |
π§ Key Methodology: Age-Aware Triage
Our validation revealed that parsing Age and Gender from the patient description is critical for accurate risk stratification (e.g., separating "Chest Pain" in a 72M vs 20M). The model effectively learned these demographic risk factors, improving accuracy from 16% to 60%.
See our W&B Report for detailed training curves.
π©Ί Clinical Framework: Manchester Triage System
| Category | Priority | Target Time | Example |
|---|---|---|---|
| 1 | Immediate | 0 min | Cardiac arrest, Anaphylaxis |
| 2 | Very Urgent | 10 min | Chest pain, Stroke |
| 3 | Urgent | 60 min | Abdominal pain, Fractures |
| 4 | Standard | 120 min | Minor injuries, Mild illness |
| 5 | Non-Urgent | 240 min | Minor cuts, GP-suitable |
π Resources
- Hugging Face Space: Try the Demo
- Model Card: NurseSim-Triage-Llama-3.2-3B
- Training Report: W&B Dashboard
- Blog Post: Training AI Agents for Clinical Triage
- AgentBeats Profile: NurseSim-Triage Benchmark
- Leaderboard: Community Results
- Docker Hub: nursecitizendeveloper/nursesim-triage
π€ AgentBeats Integration
NurseSim-Triage implements the Agent-to-Agent (A2A) protocol for automated benchmarking:
Protocol Details
- Version: a2a/v1.0
- Agent Card:
/.well-known/agent-card.json - Health Endpoint:
/health - Task Endpoint:
/process-task(POST)
Evaluation Metrics
- Triage Accuracy (0-1): Percentage of correct MTS assignments
- Safety Score (0-1): Penalizes dangerous under-triage
- Response Quality (0-1): Clinical reasoning coherence
- Response Time (ms): Computational efficiency
Submit Your Agent
- Register on AgentBeats
- Implement the A2A protocol
- Submit to NurseSim-Triage benchmark
- View results on the leaderboard
π³ Deployment
Hugging Face Spaces
Deployed on NVIDIA T4 (Medium) GPU with:
- 4-bit quantization (
BitsAndBytesConfig) - Asynchronous model loading
- Dual-mode support (Gradio + A2A)
Docker
# Build locally
docker build -t nursesim-triage .
# Run in demo mode
docker run -p 7860:7860 nursesim-triage
# Run in A2A mode
docker run -e MODE=a2a -p 7860:7860 nursesim-triage
Environment Variables
MODE:gradio(default) ora2aHF_TOKEN: Hugging Face API token (for private models)OMP_NUM_THREADS: OpenMP threads (auto-configured)
π OpenEnv Challenge
This project was submitted to the OpenEnv Challenge 2026 (Berkeley RDI AgentX-AgentBeats Competition).
Key Contributions:
- Novel benchmark for clinical AI evaluation
- Safety-focused metrics (penalizes under-triage)
- Open-source training pipeline
- Reproducible Docker deployment
- Community leaderboard
π License
MIT License - See LICENSE for details.
π Acknowledgements
Mentors and Champions of Innovation:
- Dr Clare Cable, Chief Executive, Burdett Trust for Nursing β For championing Relational Intelligence
- Professor Joanne Bosanquet, Chief Executive, Foundation of Nursing Studies β For championing person-centred nursing
- Professor Gemma Stacey, Programme Director, Nursing Now Challenge β For inspiring global nursing leadership
- Aisha Holloway, Chief Nursing Officer, Scotland β For inspiring excellence
- Josie Rudman MBE β Mutual Mentor & champion of nurse-led innovation
Research & Education Partners:
- Kumbi Kariwo β Champion of AI equity and bias mitigation
- Rohit Sagoo β Children's Nurse & Innovator in education and practice
- Dr Hellena Habte-Asres β Big Data Researcher, Nurse & Innovator
- Kelly Thobekile Ncube β Senior Lecturer in Adult Nursing (SFHEA) and Global Health Lecturer Volunteer Fellow
Technical Community:
- OpenEnv Challenge β Berkeley RDI, PyTorch, Hugging Face, Unsloth
- Manchester Triage System β Clinical framework
- Unsloth AI β 2x faster fine-tuning
- AgentBeats β A2A protocol infrastructure
- NVIDIA β T4 GPU infrastructure
Built for the OpenEnv Challenge 2026 π
