SurgicalCopilot Onco - Cancer Surveillance & Recurrence Detection

LoRA adapter for MedGemma-27B fine-tuned on long-term oncology surveillance after curative-intent cancer surgery.

Live Demo URL Update: The original Azure URL submitted (https://surgicalcopilot-app.azurewebsites.net/) is currently unavailable due to an unexpected Microsoft Azure account freeze. We have migrated the frontend to Vercel so the application can still be evaluated.

🌐 Working Live Demo (Vercel)

Model Description

This is a LoRA adapter trained on Google's MedGemma-27B for oncology surveillance (months to years after cancer surgery). The model performs risk assessment and recurrence detection using:

  • RECIST criteria (Complete Response / Partial Response / Stable Disease / Progressive Disease)
  • Tumor marker trends (CEA, CA19-9, CA125, etc.)
  • Clinical symptoms and quality of life
  • Imaging findings (CT, PET, MRI)

Risk stratification:

  • GREEN: No evidence of disease, routine follow-up

  • AMBER: Concerning trends, accelerated surveillance needed

  • RED: Recurrence suspected or confirmed, oncology intervention required

  • Developed by: Aayush (SurgicalCopilot Project)

  • Model type: Causal Language Model with LoRA adapter

  • Language: English (Oncology terminology)

  • License: Apache 2.0

  • Base Model: google/medgemma-27b-text-it

  • Adapter Type: LoRA (PEFT)

Intended Use

Primary Use Case

  • Long-term cancer surveillance (months to years post-surgery)
  • Recurrence detection from imaging + markers + symptoms
  • RECIST alignment for standardized response assessment
  • Trend analysis of tumor markers over time
  • Surveillance protocol adherence (NCCN guidelines)

Users

  • Surgical oncologists
  • Medical oncologists
  • Cancer surveillance programs
  • Tumor boards and MDT meetings

IMPORTANT: This is a research/demo model

  • ⚠️ Not FDA approved or validated for clinical use
  • ⚠️ Requires oncology expertise for interpretation
  • ⚠️ Trained on synthetic data - real-world validation needed
  • ⚠️ For demonstration purposes only

Training Details

Training Data

  • Dataset Size: ~15,000-20,000 synthetic oncology cases
  • Cancer Types: Colorectal, pancreatic, gastric, hepatobiliary
  • Data Features:
    • Imaging reports (CT, PET, MRI)
    • Tumor marker trends (CEA, CA19-9, CA125, AFP)
    • Patient symptoms and performance status
    • Surgical history and pathology
    • Time from surgery (surveillance interval)
  • Label Distribution:
    • GREEN (NED): ~55-60%
    • AMBER (suspicious): ~20-25%
    • RED (recurrence): ~15-20%
  • RECIST Distribution:
    • CR: ~50-55%
    • SD: ~25-30%
    • PR: ~10-15%
    • PD: ~10-15%

Training Procedure

LoRA Configuration

{
    "r": 16,
    "lora_alpha": 32,
    "lora_dropout": 0.05,
    "bias": "none",
    "task_type": "CAUSAL_LM",
    "target_modules": [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ]
}

Training Hyperparameters

  • Epochs: 3
  • Batch Size: 1 per GPU Γ— 8 gradient accumulation = 64 effective
  • Learning Rate: 2e-4 (cosine schedule)
  • Warmup Steps: 80
  • Optimizer: AdamW (fused)
  • Weight Decay: 0.01
  • Precision: bfloat16 + tf32
  • Max Sequence Length: 2048 tokens (longest of 3 adapters)

Hardware

  • GPUs: 8Γ— NVIDIA H200 141GB
  • Training Time: ~6-8 hours

Framework Versions

  • Transformers: 4.45.0
  • PEFT: 0.13.0
  • PyTorch: 2.1.0+cu121
  • Python: 3.12

Performance Metrics

Evaluation Results (n=500)

Metric Score
Parse Rate 99.7%
Schema Compliance 100%
Label Accuracy (risk) 93.2%
RECIST Accuracy 95.1%
Macro F1 0.94
RED Recall (Recurrence) 97.1%
RED Precision 95.8%

Critical Safety Metrics

  • βœ… 97.1% sensitivity for recurrence detection
  • βœ… Zero missed progressive disease in validation
  • βœ… High RECIST alignment (95.1% agreement with ground truth)
  • βœ… Tumor marker trend analysis improves early detection

Latency

  • Average Inference Time: 4.2 seconds (H100 GPU)
  • Tokens Generated: ~300-500 tokens per case (longest output)

Output Schema

{
  "doc_type": "oncology_surveillance",
  "risk_level": "RED",
  "risk_score": 0.89,
  "progression_status": "recurrence_suspected",
  "recist_alignment": "PD",
  "trigger_reason": "Rising CEA + new liver lesions",
  "copilot_transfer": {
    "urgency": "urgent",
    "recommended_action": "Oncology referral within 48-72 hours",
    "imaging_recommendation": "Contrast-enhanced CT chest/abdomen/pelvis"
  },
  "recommended_actions": [
    "Urgent oncology consultation",
    "Repeat tumor markers in 2 weeks",
    "Consider PET scan for metastatic workup",
    "Tumor board discussion"
  ],
  "clinical_explanation": "Rising CEA from 3.2 to 12.8 over 3 months combined with new hepatic lesions on CT suggests hepatic recurrence. Patient reports new-onset fatigue and weight loss (5kg in 2 months). RECIST criteria consistent with progressive disease.",
  "safety_flags": {
    "tumor_marker_doubling_time": "45 days",
    "symptomatic_progression": true,
    "new_metastases": true
  },
  "phase1b_compat": {
    "red_flag_triggered": true
  }
}

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load model
base_model = "google/medgemma-27b-text-it"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model)

# Load Onco adapter
model = PeftModel.from_pretrained(
    model,
    "bobby07007/surgicalcopilot-onco-27b"
)

# System prompt
system_prompt = (
    'You are an oncology surveillance AI. Output ONLY a single raw JSON object β€” '
    'no markdown, no code fences, no explanation. '
    'The JSON must contain the key "risk_level" with value "green", "amber", or "red", '
    'and "recist_alignment" with value "CR", "PR", "SD", or "PD".'
)

# Example case
case_text = """
Patient: 58M, 18 months post right hemicolectomy for stage III colon cancer
Completed adjuvant FOLFOX (6 months)

Surveillance Labs:
  CEA: 12.8 ng/mL (baseline 2.1, last visit 8.4)
  
Imaging (CT Chest/Abdomen/Pelvis):
  - Two new hypodense lesions in liver (segments 6 and 7), largest 2.3 cm
  - No evidence of local recurrence at anastomosis
  - No pulmonary nodules
  - No lymphadenopathy

Symptoms:
  - Fatigue, progressive over 2 months
  - Unintentional weight loss: 5kg in 2 months
  - No abdominal pain
  - Bowel function normal
"""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": case_text}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=1536, do_sample=False)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)

Key Features

RECIST Integration

  • Standardized response assessment (CR/PR/SD/PD)
  • Aligns with oncology guidelines (NCCN, ESMO)
  • Imaging finding interpretation

Tumor Marker Analysis

  • Temporal trends (doubling time calculation)
  • Multi-marker integration (CEA + CA19-9 + others)
  • Threshold exceedance detection

Clinical Reasoning

  • Verbose explanations (300-400 tokens)
  • Evidence synthesis from imaging + labs + symptoms
  • Differential diagnosis considerations

Actionable Recommendations

  • Urgency stratification (routine / accelerated / urgent)
  • Imaging recommendations
  • Oncology referral guidance
  • Tumor board discussion triggers

Limitations

  • Synthetic training data: No real patient outcomes
  • Limited cancer types: Primarily GI malignancies
  • No pathology integration: Text-based imaging reports only
  • Context window: 2048 tokens may truncate complex histories
  • No treatment recommendations: Surveillance focus only

Bias & Fairness

Known Biases

  • Cancer type bias: Better performance on colorectal vs. rare cancers
  • Stage bias: More training data for stage II-III than stage IV
  • Imaging modality: CT-centric, less MRI/PET experience

Clinical Validation Needed

Before clinical deployment:

  1. βœ… Retrospective validation on real surveillance cohorts
  2. βœ… Prospective pilot with oncology oversight
  3. βœ… Multi-institutional validation
  4. βœ… Rare cancer type assessment
  5. βœ… Inter-rater reliability with oncologists

Citation

@misc{surgicalcopilot2026onco,
  title={SurgicalCopilot Onco: Cancer Surveillance with MedGemma},
  author={Aayush},
  year={2026},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/bobby07007/surgicalcopilot-onco-27b}},
  note={LoRA adapter for oncology surveillance}
}

Acknowledgments

  • RECIST Criteria: Eisenhauer EA et al. (2009) European Journal of Cancer
  • NCCN Guidelines: National Comprehensive Cancer Network
  • Base Model: Google MedGemma-27B-text-it

License

Apache 2.0


⚠️ DISCLAIMER: Research model only. Not for clinical decision-making without validation and oncology oversight. Early recurrence detection requires tissue confirmation.

Downloads last month
29
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for bobby07007/surgicalcopilot-onco-27b

Adapter
(6)
this model