NurseCitizenDeveloper's picture
chore: trigger rebuild
74c91a3
metadata
title: NurseSim Triage
emoji: πŸ₯
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false

NurseSim-RL: A Healthcare Agent Environment for Clinical Triage

AgentBeats A2A

OpenEnv Challenge Hugging Face Model W&B Report License: MIT

OpenEnv Challenge Entry | Berkeley RDI AgentX-AgentBeats Competition
A Gymnasium-compatible RL environment for training AI agents to perform clinical triage using the Manchester Triage System (MTS).

NurseSim Demo

🎯 Overview

NurseSim-RL simulates the decision-making process of a Triage Nurse in an Accident & Emergency (A&E) department. The agent must assess patients based on their chief complaint and vital signs, then assign an appropriate triage category (1-5) according to the Manchester Triage System.

Key Features

  • Gymnasium-Compatible: Standard RL interface for easy integration.
  • Expanded Dataset: Trained on 2,100+ synthetic patient scenarios across all 5 MTS categories.
  • Safety-Aware Rewards: Heavy penalties for under-triaging critical patients.
  • Fine-Tuned Agent: Llama 3.2 3B trained with Unsloth (4-bit QLoRA) - 60% accuracy validated.
  • NEW: Semantic RL Mode: NurseEmbed-powered text embeddings for language-conditioned agents.
  • Age-Aware Triage: Demographic parsing for accurate risk stratification.
  • A2A Protocol: Agent-to-Agent evaluation via AgentBeats platform.
  • Docker Deployment: Fully containerized for reproducibility.
  • Dual Mode: Runs as interactive demo (Gradio) or API server (A2A).

πŸš€ Quick Start

Run with Docker

# Pull the image
docker pull nursecitizendeveloper/nursesim-triage:latest

# Run in demo mode (Gradio UI)
docker run -p 7860:7860 nursecitizendeveloper/nursesim-triage:latest

# Run in A2A mode (API only)
docker run -e MODE=a2a -p 7860:7860 nursecitizendeveloper/nursesim-triage:latest

Test the A2A Endpoint

# Health check
curl https://nursecitizendeveloper-nursesim-triage-demo.hf.space/health

# Get agent card
curl https://nursecitizendeveloper-nursesim-triage-demo.hf.space/.well-known/agent-card.json

# Submit a task
curl -X POST https://nursecitizendeveloper-nursesim-triage-demo.hf.space/process-task \
  -H "Content-Type: application/json" \
  -d '{
    "complaint": "Chest pain",
    "vitals": {
      "heart_rate": 110,
      "blood_pressure": "90/60",
      "spo2": 94,
      "temperature": 37.2
    }
  }'

πŸ—οΈ Project Structure

NurseSim-RL/
β”œβ”€β”€ nursesim_rl/           # Core environment package
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ TriageEnv.py       # Gymnasium environment
β”‚   └── PatientGenerator.py # Synthetic patient generation
β”œβ”€β”€ notebooks/
β”‚   └── NurseSim_RL_Unsloth_Training.ipynb  # Training notebook
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ train.jsonl        # Training dataset (500 examples)
β”‚   └── val.jsonl          # Validation dataset (100 examples)
β”œβ”€β”€ app.py                 # Gradio demo application
β”œβ”€β”€ Dockerfile             # For reproducibility
β”œβ”€β”€ requirements.txt
└── README.md

πŸš€ Quick Start

Installation

git clone https://github.com/NurseCitizenDeveloper/NurseSim-RL.git
cd NurseSim-RL
pip install -r requirements.txt

Using the Environment

import gymnasium as gym
from nursesim_rl import TriageEnv

env = gym.make("NurseSim-Triage-v0")
obs, info = env.reset()

# Agent takes an action
action = {"triage_category": 2, "intervention": 1}
obs, reward, terminated, truncated, info = env.step(action)

Running the Demo

Gradio Mode (Human UI):

export AGENT_MODE=gradio
export HF_TOKEN=your_hf_token_here
python app.py

AgentBeats A2A Mode (Platform Integration):

export AGENT_MODE=a2a
export HF_TOKEN=your_hf_token_here
python agent_main.py

πŸ€– AgentBeats Integration

This agent is fully compatible with the AgentBeats platform for automated agent evaluation via the Agent-to-Agent (A2A) protocol.

Dual-Mode Architecture

The agent supports two deployment modes:

Mode Purpose Entry Point Port
Gradio Human-facing UI for demos app.py 7860
A2A Platform integration for automated evaluation agent_main.py 8080

Set the mode via the AGENT_MODE environment variable.

A2A Protocol Compliance

  • Agent Card: .well-known/agent-card.json - Metadata and schemas
  • Task Processing: Structured input/output for triage assessments
  • Lifecycle Methods: reset(), health_check()
  • Protocol Version: A2A v1.0

Local Testing with AgentBeats Controller

# Install earthshaker SDK
pip install earthshaker

# Set environment variables
export HF_TOKEN=your_hf_token_here
export AGENT_MODE=a2a

# Run the controller
earthshaker run_ctrl

# Test the agent card endpoint (in another terminal)
curl http://localhost:8080/.well-known/agent-card.json | jq

# Submit a test task via A2A protocol
curl -X POST http://localhost:8080/task \
  -H "Content-Type: application/json" \
  -d '{
    "complaint": "Chest pain and shortness of breath",
    "vitals": {
      "heart_rate": 120,
      "blood_pressure": "85/55",
      "spo2": 89,
      "temperature": 37.8
    }
  }'

Docker Deployment

Build:

docker build -t nursesim-triage:latest .

Run in A2A Mode:

docker run -e HF_TOKEN=$HF_TOKEN -e AGENT_MODE=a2a -p 8080:8080 nursesim-triage:latest

Run in Gradio Mode:

docker run -e HF_TOKEN=$HF_TOKEN -e AGENT_MODE=gradio -p 7860:7860 nursesim-triage:latest

πŸ“Š Training Results & Validation

The agent was fine-tuned using Unsloth on a Llama 3.2 3B base model with an expanded dataset of ~2,100 clinical scenarios.

βœ… Performance Metrics (Validated)

Evaluated on 15 Gold-Standard Clinical Scenarios using GPT-5.2 as a Clinical Judge.

Metric Value Description
Accuracy 60% Exact match with Manchester Triage Categories (1-5)
Safety 70%+ Pass Rate for critical life-threat detection (Sepsis, Anaphylaxis)
Training Loss 0.19 Final loss after 300 steps
Hardware NVIDIA A100 Google Colab
Training Time 25 minutes Using Unsloth QLoRA

🧠 Key Methodology: Age-Aware Triage

Our validation revealed that parsing Age and Gender from the patient description is critical for accurate risk stratification (e.g., separating "Chest Pain" in a 72M vs 20M). The model effectively learned these demographic risk factors, improving accuracy from 16% to 60%.

See our W&B Report for detailed training curves.

🩺 Clinical Framework: Manchester Triage System

Category Priority Target Time Example
1 Immediate 0 min Cardiac arrest, Anaphylaxis
2 Very Urgent 10 min Chest pain, Stroke
3 Urgent 60 min Abdominal pain, Fractures
4 Standard 120 min Minor injuries, Mild illness
5 Non-Urgent 240 min Minor cuts, GP-suitable

πŸ“š Resources

πŸ€– AgentBeats Integration

NurseSim-Triage implements the Agent-to-Agent (A2A) protocol for automated benchmarking:

Protocol Details

  • Version: a2a/v1.0
  • Agent Card: /.well-known/agent-card.json
  • Health Endpoint: /health
  • Task Endpoint: /process-task (POST)

Evaluation Metrics

  • Triage Accuracy (0-1): Percentage of correct MTS assignments
  • Safety Score (0-1): Penalizes dangerous under-triage
  • Response Quality (0-1): Clinical reasoning coherence
  • Response Time (ms): Computational efficiency

Submit Your Agent

  1. Register on AgentBeats
  2. Implement the A2A protocol
  3. Submit to NurseSim-Triage benchmark
  4. View results on the leaderboard

🐳 Deployment

Hugging Face Spaces

Deployed on NVIDIA T4 (Medium) GPU with:

  • 4-bit quantization (BitsAndBytesConfig)
  • Asynchronous model loading
  • Dual-mode support (Gradio + A2A)

Docker

# Build locally
docker build -t nursesim-triage .

# Run in demo mode
docker run -p 7860:7860 nursesim-triage

# Run in A2A mode
docker run -e MODE=a2a -p 7860:7860 nursesim-triage

Environment Variables

  • MODE: gradio (default) or a2a
  • HF_TOKEN: Hugging Face API token (for private models)
  • OMP_NUM_THREADS: OpenMP threads (auto-configured)

πŸ† OpenEnv Challenge

This project was submitted to the OpenEnv Challenge 2026 (Berkeley RDI AgentX-AgentBeats Competition).

Key Contributions:

  • Novel benchmark for clinical AI evaluation
  • Safety-focused metrics (penalizes under-triage)
  • Open-source training pipeline
  • Reproducible Docker deployment
  • Community leaderboard

πŸ“„ License

MIT License - See LICENSE for details.

πŸ™ Acknowledgements

Mentors and Champions of Innovation:

  • Dr Clare Cable, Chief Executive, Burdett Trust for Nursing β€” For championing Relational Intelligence
  • Professor Joanne Bosanquet, Chief Executive, Foundation of Nursing Studies β€” For championing person-centred nursing
  • Professor Gemma Stacey, Programme Director, Nursing Now Challenge β€” For inspiring global nursing leadership
  • Aisha Holloway, Chief Nursing Officer, Scotland β€” For inspiring excellence
  • Josie Rudman MBE β€” Mutual Mentor & champion of nurse-led innovation

Research & Education Partners:

  • Kumbi Kariwo β€” Champion of AI equity and bias mitigation
  • Rohit Sagoo β€” Children's Nurse & Innovator in education and practice
  • Dr Hellena Habte-Asres β€” Big Data Researcher, Nurse & Innovator
  • Kelly Thobekile Ncube β€” Senior Lecturer in Adult Nursing (SFHEA) and Global Health Lecturer Volunteer Fellow

Technical Community:

  • OpenEnv Challenge β€” Berkeley RDI, PyTorch, Hugging Face, Unsloth
  • Manchester Triage System β€” Clinical framework
  • Unsloth AI β€” 2x faster fine-tuning
  • AgentBeats β€” A2A protocol infrastructure
  • NVIDIA β€” T4 GPU infrastructure

Built for the OpenEnv Challenge 2026 πŸ†

Force rebuild trigger