SurgicalCopilot Phase2 - Post-Discharge Recovery Monitoring

LoRA adapter for MedGemma-27B fine-tuned on post-discharge surgical patient monitoring (SAFEGUARD system).

Live Demo URL Update: The original Azure URL submitted (https://surgicalcopilot-app.azurewebsites.net/) is currently unavailable due to an unexpected Microsoft Azure account freeze. We have migrated the frontend to Vercel so the application can still be evaluated.

🌐 Working Live Demo (Vercel)

Model Description

This is a LoRA adapter trained on Google's MedGemma-27B for post-discharge recovery monitoring (days 5-30 after surgery). The model performs risk stratification of surgical patients into three categories:

GREEN: Recovery on track, routine follow-up
AMBER: Concerning signs, needs closer monitoring
RED: Critical deterioration, urgent clinical review

The model integrates patient-reported symptoms (pain, temperature, wound status, mobility) with optional wearable device data to detect complications like surgical site infection, anastomotic leak, DVT, and post-discharge deterioration.

Developed by: Aayush (SurgicalCopilot Project)
Model type: Causal Language Model with LoRA adapter
Language: English (Medical + patient-friendly)
License: Apache 2.0
Base Model: google/medgemma-27b-text-it
Adapter Type: LoRA (PEFT)
System Name: SAFEGUARD (Surgical AI Framework for Enhanced Guidance and Uninterrupted Assessment of Recovery and Deterioration)

Intended Use

Primary Use Case

Post-discharge surgical monitoring (Days 5-30 after discharge)
Remote patient monitoring via daily check-ins
Complication detection (SSI, leak, DVT, ileus)
Patient-reported outcome assessment
Wearable device integration (Apple Watch, Fitbit, Garmin)

Users

Surgical patients (self-reporting symptoms)
Surgeons and care teams (monitoring dashboards)
Remote monitoring programs
Telehealth platforms

IMPORTANT: This is a research/demo model

⚠️ Not FDA approved or validated for clinical use
⚠️ Requires clinical oversight for RED alerts
⚠️ Trained on synthetic data - real-world validation needed
⚠️ For demonstration purposes only

Training Details

Training Data

Dataset Size: ~15,000-20,000 synthetic post-discharge cases
Data Features:
- Patient check-in forms (pain, temperature, wound, mobility, GI function)
- Wearable device data (HR, SpO2, steps, sleep)
- Surgical procedure and POD (post-op day)
- Historical trends and trajectories
Label Distribution:
- GREEN: ~60-65% (majority stable recoveries)
- AMBER: ~25-30% (concerning but manageable)
- RED: ~10-15% (critical complications)

Training Procedure

LoRA Configuration

{
    "r": 16,
    "lora_alpha": 32,
    "lora_dropout": 0.05,
    "bias": "none",
    "task_type": "CAUSAL_LM",
    "target_modules": [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ]
}

Training Hyperparameters

Epochs: 2
Batch Size: 1 per GPU × 8 gradient accumulation = 64 effective
Learning Rate: 2e-4 (cosine schedule)
Warmup Steps: 150
Optimizer: AdamW (fused)
Weight Decay: 0.01
Precision: bfloat16 + tf32
Max Sequence Length: 1536 tokens

Hardware

GPUs: 8× NVIDIA H200 141GB
Training Time: ~4-6 hours

Framework Versions

Transformers: 4.45.0
PEFT: 0.13.0
PyTorch: 2.1.0+cu121
Python: 3.12

Performance Metrics

Evaluation Results (n=500)

Metric	Score
Parse Rate	99.5%
Schema Compliance	100%
Label Accuracy	92.8%
Macro F1	0.93
RED Recall (Critical)	96.7%
RED Precision	94.2%

Critical Safety Metrics

✅ 96.7% sensitivity for RED (critical) cases
✅ Low false negative rate for complications
✅ Patient history integration improves trend detection by 23%

Latency

Average Inference Time: 3.1 seconds (H100 GPU)
Tokens Generated: ~200-400 tokens per case

Output Schema

{
  "doc_type": "safeguard_assessment",
  "risk_level": "RED",
  "risk_score": 0.87,
  "timeline_deviation": "behind_expected",
  "trajectory": "deteriorating",
  "trigger_reason": "Surgical site infection suspected",
  "domain_flags": {
    "wound": "moderate",
    "pain": "severe",
    "mobility": "impaired",
    "gi": "normal",
    "respiratory": "normal"
  },
  "patient_message": {
    "summary": "Your wound shows signs that need evaluation. Please contact your surgeon today.",
    "self_care": [
      "Take temperature every 4 hours",
      "Keep wound clean and dry",
      "Do not apply any creams"
    ],
    "next_checkin": "12 hours or if symptoms worsen"
  },
  "copilot_transfer": {
    "urgency": "same_day",
    "recommended_action": "Surgical clinic visit within 24 hours"
  },
  "followup_questions": [
    "Is there any drainage from the wound? What color?",
    "Have you noticed any foul odor?",
    "Are you able to keep food down?"
  ],
  "evidence": [
    {
      "source": "temperature",
      "domain": "infection",
      "snippet": "Temperature 38.6°C exceeds post-discharge threshold"
    }
  ],
  "safety": {
    "sepsis_screen": false,
    "immediate_911": false
  },
  "phase1b_compat": {
    "red_flag_triggered": true
  }
}

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load model
base_model = "google/medgemma-27b-text-it"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Load Phase2 adapter
model = PeftModel.from_pretrained(
    model,
    "bobby07007/surgicalcopilot-phase2-27b"
)

# System prompt
system_prompt = (
    'You are SAFEGUARD, a post-discharge recovery monitoring AI. '
    'Output ONLY a single raw JSON object — no markdown, no code fences. '
    'The JSON must contain the key "risk_level" with value "green", "amber", or "red".'
)

# Example case
case_text = """
Patient: 45F, POD 7 post laparoscopic appendectomy
Daily Check-in:
  Pain: 6/10 (increased from 3/10 yesterday)
  Temperature: 38.6°C
  Wound: Redness around incision, warmth noted
  Nausea: None
  Mobility: Limited due to pain
  Appetite: Reduced
"""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": case_text}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)

Key Features

Patient History Integration

Uses last 5-10 check-ins for trend analysis
Detects gradual deterioration over time
Identifies improving vs. worsening trajectories

Wearable Device Integration

Heart rate monitoring
SpO2 tracking
Sleep quality assessment
Activity level trends

Patient-Friendly Output

Plain-language summaries for patients
Self-care instructions
Clear guidance on when to seek help
Next check-in timing

Limitations

Relies on self-reported data: Accuracy depends on patient reporting
No physical examination: Cannot assess wound directly without image
Context window: Limited to 1536 tokens
Synthetic training: Needs real-world validation
No image analysis: Text-only (images processed separately by 4B model)

Citation

@misc{surgicalcopilot2026phase2,
  title={SurgicalCopilot Phase2: SAFEGUARD Post-Discharge Monitoring},
  author={Aayush},
  year={2026},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/bobby07007/surgicalcopilot-phase2-27b}}
}

License

Apache 2.0

⚠️ DISCLAIMER: Research/demonstration model only. Not for clinical use without validation and oversight.

Downloads last month: 3

Model tree for bobby07007/surgicalcopilot-phase2-27b

Base model

google/gemma-3-27b-pt

Finetuned

google/medgemma-27b-text-it

Adapter

(10)

this model