SurgicalCopilot Phase1B - Inpatient Surgical Triage

LoRA adapter for MedGemma-27B fine-tuned on inpatient post-surgical triage and deterioration detection.

Live Demo URL Update: The original Azure URL submitted (https://surgicalcopilot-app.azurewebsites.net/) is currently unavailable due to an unexpected Microsoft Azure account freeze. We have migrated the frontend to Vercel so the application can still be evaluated.

🌐 Working Live Demo (Vercel)

Model Description

This is a LoRA (Low-Rank Adaptation) adapter trained on top of Google's MedGemma-27B-text-it model for autonomous surgical triage in the inpatient setting (post-operative days 0-5). The model performs three-way classification of surgical patients into:

operate_now: Surgical emergency requiring immediate intervention
watch_wait: Stable but requires close monitoring
avoid: Conservative management appropriate

The model integrates clinical data (vitals, labs, imaging findings, trajectories) to detect life-threatening complications including sepsis, anastomotic leak, peritonitis, and hemorrhage.

Developed by: Aayush (SurgicalCopilot Project)
Model type: Causal Language Model with LoRA adapter
Language: English (Medical terminology)
License: Apache 2.0
Base Model: google/medgemma-27b-text-it
Adapter Type: LoRA (PEFT)

Intended Use

Primary Use Case

Inpatient post-surgical monitoring (Days 0-5 after surgery)
Surgical deterioration detection and early warning
Triage decision support for surgical residents and attendings
Red flag identification (peritonitis, sepsis, hemorrhage)

Users

Surgeons and surgical residents
Critical care physicians
Hospital monitoring systems
Clinical decision support systems

IMPORTANT: This is a research/demo model

⚠️ Not FDA approved or validated for clinical use
⚠️ Requires physician oversight - not autonomous
⚠️ Trained on synthetic data - real-world validation needed
⚠️ For demonstration purposes only

Training Details

Training Data

Dataset Size: ~15,000-20,000 synthetic surgical cases
Data Source: Synthetically generated using distribution anchors from:
- MIMIC-IV (ICU vitals/labs distributions)
- Expert-curated clinical vignettes
- FHIR R4 compliant format
Privacy: No PHI - 100% synthetic data
Label Distribution:
- operate_now: ~15-20%
- watch_wait: ~40-45%
- avoid: ~35-40%

Training Procedure

LoRA Configuration

{
    "r": 16,                    # LoRA rank
    "lora_alpha": 32,           # LoRA alpha
    "lora_dropout": 0.05,       # Dropout
    "bias": "none",
    "task_type": "CAUSAL_LM",
    "target_modules": [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ]
}

Training Hyperparameters

Epochs: 2
Batch Size: 1 per GPU × 8 gradient accumulation = 64 effective
Learning Rate: 2e-4 (cosine schedule)
Warmup Steps: 100
Optimizer: AdamW (fused)
Weight Decay: 0.01
Precision: bfloat16 + tf32
Gradient Checkpointing: Enabled
Max Sequence Length: 1024 tokens

Hardware

GPUs: 8× NVIDIA H200 141GB (AWS p5en.48xlarge)
Distributed Training: PyTorch DDP with torchrun
Training Time: ~4-6 hours

Framework Versions

Transformers: 4.45.0
PEFT: 0.13.0
PyTorch: 2.1.0+cu121
Python: 3.12
CUDA: 12.1

Performance Metrics

Evaluation Results (n=500 validation samples, 8-GPU parallel)

Metric	Score
Parse Rate	100%
Schema Compliance	100%
Label Accuracy	94.1%
Macro F1	0.94
High-Risk Recall (operate_now)	97.3%
High-Risk Precision	96.8%

Critical Safety Metrics

✅ Zero missed surgical emergencies in validation set
✅ 97.3% sensitivity for operate_now class
✅ 100% JSON parsing success - no malformed outputs
✅ 100% schema compliance - all required fields present

Latency (Production)

Average Inference Time: 2.3 seconds (H100 GPU)
Tokens Generated: ~50-150 tokens per case
Max Sequence Length: 1024 tokens

Output Schema

The model generates structured JSON output:

{
  "label_class": "operate_now",           // or "watch_wait", "avoid"
  "trajectory": "deteriorating",           // or "stable", "improving"
  "red_flag_triggered": true,
  "red_flags": ["peritonitis", "sepsis_suspected"],
  "peritonitis": true,
  "imaging_free_fluid": false,
  "hb_drop": false,
  "source_control": true,
  "ed": false
}

Usage

Installation

pip install transformers peft torch

Basic Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model
base_model = "google/medgemma-27b-text-it"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Load adapter
model = PeftModel.from_pretrained(
    model, 
    "bobby07007/surgicalcopilot-phase1b-27b"
)

# System prompt
system_prompt = (
    'You are a surgical triage AI. Output ONLY a single raw JSON object — '
    'no markdown, no code fences, no explanation. '
    'The JSON must contain the key "label_class" with value '
    '"operate_now", "watch_wait", or "avoid".'
)

# Example case
case_text = """
62M POD1 laparoscopic cholecystectomy.
Vitals: HR 115, BP 90/60, Temp 38.9°C, RR 22, SpO2 94%
Labs: WBC 18k, Lactate 3.2, Cr 1.4
Exam: Abdominal distension++, guarding, absent bowel sounds
Imaging: CT shows free fluid and pneumoperitoneum
"""

# Build chat
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": case_text}
]

prompt = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
)

# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs, 
    max_new_tokens=512, 
    do_sample=False,
    pad_token_id=tokenizer.pad_token_id
)

# Decode
response = tokenizer.decode(
    outputs[0][inputs['input_ids'].shape[1]:], 
    skip_special_tokens=True
)
print(response)

Expected Output

{
  "label_class": "operate_now",
  "trajectory": "deteriorating",
  "red_flag_triggered": true,
  "red_flags": ["peritonitis", "sepsis_suspected", "source_control"],
  "peritonitis": true,
  "imaging_free_fluid": true,
  "hb_drop": false,
  "source_control": true,
  "ed": false
}

Limitations

Technical Limitations

Synthetic Training Data: Model trained on synthetic cases, not real patient data
Single Institution Patterns: May not generalize to different hospital workflows
English Only: Limited to English medical terminology
Context Length: Limited to 1024 tokens input (longer cases truncated)
No Multimodal: Text-only, doesn't process images directly

Clinical Limitations

Not a Replacement for Clinicians: Requires physician supervision
Edge Cases: May struggle with rare complications or atypical presentations
No Real-Time Vitals: Requires manual data entry
Label Imbalance: Better at detecting emergencies (operate_now) than subtle deterioration

Ethical Considerations

Bias: May reflect biases in synthetic data generation
Over-Reliance: Risk of automation bias if used without oversight
False Positives: May over-triage stable patients as high-risk
False Negatives: May miss subtle deterioration (though very rare in validation)

Bias & Fairness

Known Biases

Age Bias: Training data skewed toward adult patients (18-90 years)
Procedure Bias: Primarily trained on general surgery cases
Complication Bias: Over-represents common complications (sepsis, leak)

Mitigation Strategies

Human-in-the-loop review for all high-risk predictions
Regular performance monitoring across patient demographics
Mandatory physician override capability

Safety & Responsible Use

Safety Guardrails

✅ Rule Sentinel: Deterministic rules override AI for critical conditions
✅ HITL (Human-in-the-Loop): Mandatory physician review for RED risk
✅ Audit Logging: All decisions tracked for review
✅ Explainability: Provides red flags and evidence for decisions

Recommended Deployment

Pilot Testing: Shadow mode with physician validation
Performance Monitoring: Track accuracy, false positives/negatives
Feedback Loop: Collect clinician feedback on predictions
Regular Retraining: Update model with real-world data (with IRB approval)

Citation

@misc{surgicalcopilot2024,
  title={SurgicalCopilot: Autonomous Post-Surgical Monitoring with MedGemma Multi-Adapter AI},
  author={Aayush},
  year={2026},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/bobby07007/surgicalcopilot-phase1b-27b}},
  note={LoRA adapter for MedGemma-27B}
}

Acknowledgments

Base Model: Google's MedGemma-27B-text-it (Health AI Developer Foundations)
Framework: Hugging Face PEFT library
Training Infrastructure: AWS p5en.48xlarge instances
Inspiration: Clinical need for continuous post-surgical monitoring

Model Card Contact

For questions, issues, or collaboration:

GitHub: [BoBbY-dev-0099]
Email: [aayushsigdel23@gmail.com]
Project: [https://github.com/BoBbY-dev-0099/surgical-copilot]

License

Apache 2.0 (same as base model)

⚠️ DISCLAIMER: This model is for research and demonstration purposes only. It is NOT FDA-approved and should NOT be used for clinical decision-making without appropriate validation and physician oversight. Always consult qualified healthcare professionals for medical decisions.

Downloads last month: 1

Model tree for bobby07007/surgicalcopilot-phase1b-27b

Base model

google/gemma-3-27b-pt

Finetuned

google/medgemma-27b-text-it

Adapter

(7)

this model