Spaces:
Paused
Model Recommendations for Medical Text Summarization
Executive Summary
Recommended Model: microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf
This is the PRIMARY model configured in models_config.json with "is_active": true.
β οΈ Models NOT Recommended for Medical Text
1. patrickvonplaten/longformer2roberta-cnn_dailymail-fp16
Status: β DEPRECATED - DO NOT USE
Problem: This model produces irrelevant summaries for medical text because:
- Training Mismatch: Trained on news articles (CNN/DailyMail dataset), NOT medical text
- Domain Gap: Cannot understand:
- Clinical terminology and medical abbreviations
- Structured visit data and medical codes
- ICD codes, medications, dosages
- Clinical narrative style
- Not Instruction-Tuned: Cannot follow medical summarization instructions properly
What Happens: The model tries to summarize medical data as if it were a news article, resulting in nonsensical output that misses critical clinical information.
Solution: Use Phi-3-mini-4k-instruct-q4.gguf instead.
2. facebook/bart-large-cnn
Status: β οΈ NOT RECOMMENDED FOR MEDICAL TEXT
Problem: Similar to Longformer:
- Trained on news articles (CNN/DailyMail)
- Limited medical domain knowledge
- May produce suboptimal results for clinical text
Better Alternative: Use Phi-3-mini-4k-instruct-q4.gguf
β Recommended Models
1. microsoft/Phi-3-mini-4k-instruct-q4.gguf (PRIMARY - ACTIVE)
Why This Model?
β Instruction-tuned: Understands and follows complex medical summarization prompts β General domain knowledge: Trained on diverse data including medical/technical content β Efficient: GGUF quantization (Q4) provides excellent performance with lower resource usage β Reliable: Produces coherent, relevant medical summaries β Fast: CPU-optimized, works well in production
Configuration:
{
"name": "microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf",
"type": "gguf",
"is_active": true,
"cached": true,
"description": "Phi-3 Mini GGUF Q4 quantized - PRIMARY MODEL",
"use_case": "Fast patient summary generation with CPU/GPU"
}
2. google/flan-t5-large (ALTERNATIVE)
Status: β Good Alternative
Advantages:
- Instruction-tuned (FLAN methodology)
- Can follow summarization instructions
- Smaller than Phi-3, faster inference
- Better than BART/Longformer for structured text
Use When:
- Need faster inference than Phi-3
- Memory constraints
- Simple summarization tasks
Technical Background: Why News Models Fail on Medical Text
Training Data Mismatch
News Articles (CNN/DailyMail):
Title: New Study Shows Coffee Benefits
Body: A recent study published in the Journal of Medicine found that...
Summary: Research indicates coffee may have health benefits including...
Medical Records:
Visit 2024-01-15:
Chief Complaint: SOB, DOE
HPI: 65F w/ PMH of HTN, DM2, presents with 3d progressive DOE...
PE: RRR, no m/r/g. Lungs CTAB. +1 bilateral LE edema...
A/P: 1. CHF exacerbation - start Lasix 40mg PO daily...
What News Models Do Wrong
- Terminology: Can't understand medical abbreviations (SOB, DOE, HTN, DM2, CTAB, etc.)
- Structure: Expect narrative news format, not clinical structured data
- Priority: News models prioritize "interesting" content; medical needs prioritize clinical significance
- Context: Medical context requires understanding relationships between symptoms, diagnoses, medications
- Instructions: Cannot follow complex instructions like "generate a comprehensive clinical summary focusing on changes over time"
Migration Guide
If You're Currently Using Longformer or BART:
Step 1: Update your API request to use the recommended model:
{
"patient_summarizer_model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
"patient_summarizer_model_type": "gguf",
"generation_mode": "gguf"
}
Step 2: Remove any model-name specification to use the default (Phi-3):
{
// Just omit model specification - defaults to Phi-3
"patientid": "12345",
"token": "your-token",
"key": "your-key"
}
Step 3: Test the output quality and adjust parameters if needed:
{
"max_new_tokens": 2048, // Adjust output length
"temperature": 0.1, // Lower = more focused, Higher = more creative
"top_p": 0.5 // Lower = more deterministic
}
Configuration Reference
Current Active Configuration (models_config.json)
{
"patient_summary_models": [
{
"name": "microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf",
"type": "gguf",
"is_active": true, // β PRIMARY MODEL
"cached": true,
"description": "Phi-3 Mini GGUF Q4 quantized - PRIMARY MODEL",
"use_case": "Fast patient summary generation with CPU/GPU",
"repo_id": "microsoft/Phi-3-mini-4k-instruct-gguf",
"filename": "Phi-3-mini-4k-instruct-q4.gguf"
}
]
}
Performance Comparison
| Model | Medical Text Quality | Speed | Memory | Instruction Following |
|---|---|---|---|---|
| Phi-3 GGUF Q4 | βββββ Excellent | Fast | Low | β Yes |
| FLAN-T5 Large | ββββ Good | Very Fast | Low | β Yes |
| Longformer | β Poor (Irrelevant) | Slow | High | β No |
| BART-CNN | ββ Poor | Medium | Medium | β No |
FAQs
Q: Can I still use Longformer/BART? A: Technically yes (they're still cached), but strongly not recommended. They will produce irrelevant summaries.
Q: Why are these models still in the config?
A: For backward compatibility and documentation. They're marked as deprecated and is_active: false.
Q: What if Phi-3 is too slow?
A: Try google/flan-t5-large as an alternative. Still instruction-tuned but smaller/faster.
Q: Can you fix Longformer to work with medical text? A: No. The model's training is fundamentally incompatible. Would require retraining on medical data.
Summary
β
DO USE: Phi-3-mini-4k-instruct-q4.gguf (default/recommended)
β
ALTERNATIVE: google/flan-t5-large
β οΈ AVOID: facebook/bart-large-cnn
β DO NOT USE: patrickvonplaten/longformer2roberta-cnn_dailymail-fp16
The Longformer model's irrelevant summaries are due to fundamental training mismatch with medical domain, not a bug that can be fixed.