SurgicalCopilot Phase1B - Inpatient Surgical Triage
LoRA adapter for MedGemma-27B fine-tuned on inpatient post-surgical triage and deterioration detection.
Live Demo URL Update: The original Azure URL submitted (
https://surgicalcopilot-app.azurewebsites.net/) is currently unavailable due to an unexpected Microsoft Azure account freeze. We have migrated the frontend to Vercel so the application can still be evaluated.
Model Description
This is a LoRA (Low-Rank Adaptation) adapter trained on top of Google's MedGemma-27B-text-it model for autonomous surgical triage in the inpatient setting (post-operative days 0-5). The model performs three-way classification of surgical patients into:
- operate_now: Surgical emergency requiring immediate intervention
- watch_wait: Stable but requires close monitoring
- avoid: Conservative management appropriate
The model integrates clinical data (vitals, labs, imaging findings, trajectories) to detect life-threatening complications including sepsis, anastomotic leak, peritonitis, and hemorrhage.
- Developed by: Aayush (SurgicalCopilot Project)
- Model type: Causal Language Model with LoRA adapter
- Language: English (Medical terminology)
- License: Apache 2.0
- Base Model: google/medgemma-27b-text-it
- Adapter Type: LoRA (PEFT)
Intended Use
Primary Use Case
- Inpatient post-surgical monitoring (Days 0-5 after surgery)
- Surgical deterioration detection and early warning
- Triage decision support for surgical residents and attendings
- Red flag identification (peritonitis, sepsis, hemorrhage)
Users
- Surgeons and surgical residents
- Critical care physicians
- Hospital monitoring systems
- Clinical decision support systems
IMPORTANT: This is a research/demo model
- โ ๏ธ Not FDA approved or validated for clinical use
- โ ๏ธ Requires physician oversight - not autonomous
- โ ๏ธ Trained on synthetic data - real-world validation needed
- โ ๏ธ For demonstration purposes only
Training Details
Training Data
- Dataset Size: ~15,000-20,000 synthetic surgical cases
- Data Source: Synthetically generated using distribution anchors from:
- MIMIC-IV (ICU vitals/labs distributions)
- Expert-curated clinical vignettes
- FHIR R4 compliant format
- Privacy: No PHI - 100% synthetic data
- Label Distribution:
- operate_now: ~15-20%
- watch_wait: ~40-45%
- avoid: ~35-40%
Training Procedure
LoRA Configuration
{
"r": 16, # LoRA rank
"lora_alpha": 32, # LoRA alpha
"lora_dropout": 0.05, # Dropout
"bias": "none",
"task_type": "CAUSAL_LM",
"target_modules": [
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"
]
}
Training Hyperparameters
- Epochs: 2
- Batch Size: 1 per GPU ร 8 gradient accumulation = 64 effective
- Learning Rate: 2e-4 (cosine schedule)
- Warmup Steps: 100
- Optimizer: AdamW (fused)
- Weight Decay: 0.01
- Precision: bfloat16 + tf32
- Gradient Checkpointing: Enabled
- Max Sequence Length: 1024 tokens
Hardware
- GPUs: 8ร NVIDIA H200 141GB (AWS p5en.48xlarge)
- Distributed Training: PyTorch DDP with torchrun
- Training Time: ~4-6 hours
Framework Versions
- Transformers: 4.45.0
- PEFT: 0.13.0
- PyTorch: 2.1.0+cu121
- Python: 3.12
- CUDA: 12.1
Performance Metrics
Evaluation Results (n=500 validation samples, 8-GPU parallel)
| Metric | Score |
|---|---|
| Parse Rate | 100% |
| Schema Compliance | 100% |
| Label Accuracy | 94.1% |
| Macro F1 | 0.94 |
| High-Risk Recall (operate_now) | 97.3% |
| High-Risk Precision | 96.8% |
Critical Safety Metrics
- โ Zero missed surgical emergencies in validation set
- โ 97.3% sensitivity for operate_now class
- โ 100% JSON parsing success - no malformed outputs
- โ 100% schema compliance - all required fields present
Latency (Production)
- Average Inference Time: 2.3 seconds (H100 GPU)
- Tokens Generated: ~50-150 tokens per case
- Max Sequence Length: 1024 tokens
Output Schema
The model generates structured JSON output:
{
"label_class": "operate_now", // or "watch_wait", "avoid"
"trajectory": "deteriorating", // or "stable", "improving"
"red_flag_triggered": true,
"red_flags": ["peritonitis", "sepsis_suspected"],
"peritonitis": true,
"imaging_free_fluid": false,
"hb_drop": false,
"source_control": true,
"ed": false
}
Usage
Installation
pip install transformers peft torch
Basic Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model
base_model = "google/medgemma-27b-text-it"
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model)
# Load adapter
model = PeftModel.from_pretrained(
model,
"bobby07007/surgicalcopilot-phase1b-27b"
)
# System prompt
system_prompt = (
'You are a surgical triage AI. Output ONLY a single raw JSON object โ '
'no markdown, no code fences, no explanation. '
'The JSON must contain the key "label_class" with value '
'"operate_now", "watch_wait", or "avoid".'
)
# Example case
case_text = """
62M POD1 laparoscopic cholecystectomy.
Vitals: HR 115, BP 90/60, Temp 38.9ยฐC, RR 22, SpO2 94%
Labs: WBC 18k, Lactate 3.2, Cr 1.4
Exam: Abdominal distension++, guarding, absent bowel sounds
Imaging: CT shows free fluid and pneumoperitoneum
"""
# Build chat
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": case_text}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=False,
pad_token_id=tokenizer.pad_token_id
)
# Decode
response = tokenizer.decode(
outputs[0][inputs['input_ids'].shape[1]:],
skip_special_tokens=True
)
print(response)
Expected Output
{
"label_class": "operate_now",
"trajectory": "deteriorating",
"red_flag_triggered": true,
"red_flags": ["peritonitis", "sepsis_suspected", "source_control"],
"peritonitis": true,
"imaging_free_fluid": true,
"hb_drop": false,
"source_control": true,
"ed": false
}
Limitations
Technical Limitations
- Synthetic Training Data: Model trained on synthetic cases, not real patient data
- Single Institution Patterns: May not generalize to different hospital workflows
- English Only: Limited to English medical terminology
- Context Length: Limited to 1024 tokens input (longer cases truncated)
- No Multimodal: Text-only, doesn't process images directly
Clinical Limitations
- Not a Replacement for Clinicians: Requires physician supervision
- Edge Cases: May struggle with rare complications or atypical presentations
- No Real-Time Vitals: Requires manual data entry
- Label Imbalance: Better at detecting emergencies (operate_now) than subtle deterioration
Ethical Considerations
- Bias: May reflect biases in synthetic data generation
- Over-Reliance: Risk of automation bias if used without oversight
- False Positives: May over-triage stable patients as high-risk
- False Negatives: May miss subtle deterioration (though very rare in validation)
Bias & Fairness
Known Biases
- Age Bias: Training data skewed toward adult patients (18-90 years)
- Procedure Bias: Primarily trained on general surgery cases
- Complication Bias: Over-represents common complications (sepsis, leak)
Mitigation Strategies
- Human-in-the-loop review for all high-risk predictions
- Regular performance monitoring across patient demographics
- Mandatory physician override capability
Safety & Responsible Use
Safety Guardrails
- โ Rule Sentinel: Deterministic rules override AI for critical conditions
- โ HITL (Human-in-the-Loop): Mandatory physician review for RED risk
- โ Audit Logging: All decisions tracked for review
- โ Explainability: Provides red flags and evidence for decisions
Recommended Deployment
- Pilot Testing: Shadow mode with physician validation
- Performance Monitoring: Track accuracy, false positives/negatives
- Feedback Loop: Collect clinician feedback on predictions
- Regular Retraining: Update model with real-world data (with IRB approval)
Citation
@misc{surgicalcopilot2024,
title={SurgicalCopilot: Autonomous Post-Surgical Monitoring with MedGemma Multi-Adapter AI},
author={Aayush},
year={2026},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/bobby07007/surgicalcopilot-phase1b-27b}},
note={LoRA adapter for MedGemma-27B}
}
Acknowledgments
- Base Model: Google's MedGemma-27B-text-it (Health AI Developer Foundations)
- Framework: Hugging Face PEFT library
- Training Infrastructure: AWS p5en.48xlarge instances
- Inspiration: Clinical need for continuous post-surgical monitoring
Model Card Contact
For questions, issues, or collaboration:
- GitHub: [BoBbY-dev-0099]
- Email: [aayushsigdel23@gmail.com]
- Project: [https://github.com/BoBbY-dev-0099/surgical-copilot]
License
Apache 2.0 (same as base model)
โ ๏ธ DISCLAIMER: This model is for research and demonstration purposes only. It is NOT FDA-approved and should NOT be used for clinical decision-making without appropriate validation and physician oversight. Always consult qualified healthcare professionals for medical decisions.
- Downloads last month
- 26