HNTAI / docs /MODEL_RECOMMENDATIONS.md
sachinchandrankallar's picture
changes for publishing the latest including generate_generic api
4156c57

Model Recommendations for Medical Text Summarization

Executive Summary

Recommended Model: microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf

This is the PRIMARY model configured in models_config.json with "is_active": true.


⚠️ Models NOT Recommended for Medical Text

1. patrickvonplaten/longformer2roberta-cnn_dailymail-fp16

Status: ❌ DEPRECATED - DO NOT USE

Problem: This model produces irrelevant summaries for medical text because:

  1. Training Mismatch: Trained on news articles (CNN/DailyMail dataset), NOT medical text
  2. Domain Gap: Cannot understand:
    • Clinical terminology and medical abbreviations
    • Structured visit data and medical codes
    • ICD codes, medications, dosages
    • Clinical narrative style
  3. Not Instruction-Tuned: Cannot follow medical summarization instructions properly

What Happens: The model tries to summarize medical data as if it were a news article, resulting in nonsensical output that misses critical clinical information.

Solution: Use Phi-3-mini-4k-instruct-q4.gguf instead.


2. facebook/bart-large-cnn

Status: ⚠️ NOT RECOMMENDED FOR MEDICAL TEXT

Problem: Similar to Longformer:

  • Trained on news articles (CNN/DailyMail)
  • Limited medical domain knowledge
  • May produce suboptimal results for clinical text

Better Alternative: Use Phi-3-mini-4k-instruct-q4.gguf


βœ… Recommended Models

1. microsoft/Phi-3-mini-4k-instruct-q4.gguf (PRIMARY - ACTIVE)

Why This Model?

βœ… Instruction-tuned: Understands and follows complex medical summarization prompts βœ… General domain knowledge: Trained on diverse data including medical/technical content βœ… Efficient: GGUF quantization (Q4) provides excellent performance with lower resource usage βœ… Reliable: Produces coherent, relevant medical summaries βœ… Fast: CPU-optimized, works well in production

Configuration:

{
  "name": "microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf",
  "type": "gguf",
  "is_active": true,
  "cached": true,
  "description": "Phi-3 Mini GGUF Q4 quantized - PRIMARY MODEL",
  "use_case": "Fast patient summary generation with CPU/GPU"
}

2. google/flan-t5-large (ALTERNATIVE)

Status: βœ… Good Alternative

Advantages:

  • Instruction-tuned (FLAN methodology)
  • Can follow summarization instructions
  • Smaller than Phi-3, faster inference
  • Better than BART/Longformer for structured text

Use When:

  • Need faster inference than Phi-3
  • Memory constraints
  • Simple summarization tasks

Technical Background: Why News Models Fail on Medical Text

Training Data Mismatch

News Articles (CNN/DailyMail):

Title: New Study Shows Coffee Benefits
Body: A recent study published in the Journal of Medicine found that...
Summary: Research indicates coffee may have health benefits including...

Medical Records:

Visit 2024-01-15:
Chief Complaint: SOB, DOE
HPI: 65F w/ PMH of HTN, DM2, presents with 3d progressive DOE...
PE: RRR, no m/r/g. Lungs CTAB. +1 bilateral LE edema...
A/P: 1. CHF exacerbation - start Lasix 40mg PO daily...

What News Models Do Wrong

  1. Terminology: Can't understand medical abbreviations (SOB, DOE, HTN, DM2, CTAB, etc.)
  2. Structure: Expect narrative news format, not clinical structured data
  3. Priority: News models prioritize "interesting" content; medical needs prioritize clinical significance
  4. Context: Medical context requires understanding relationships between symptoms, diagnoses, medications
  5. Instructions: Cannot follow complex instructions like "generate a comprehensive clinical summary focusing on changes over time"

Migration Guide

If You're Currently Using Longformer or BART:

Step 1: Update your API request to use the recommended model:

{
  "patient_summarizer_model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
  "patient_summarizer_model_type": "gguf",
  "generation_mode": "gguf"
}

Step 2: Remove any model-name specification to use the default (Phi-3):

{
  // Just omit model specification - defaults to Phi-3
  "patientid": "12345",
  "token": "your-token",
  "key": "your-key"
}

Step 3: Test the output quality and adjust parameters if needed:

{
  "max_new_tokens": 2048,  // Adjust output length
  "temperature": 0.1,      // Lower = more focused, Higher = more creative
  "top_p": 0.5            // Lower = more deterministic
}

Configuration Reference

Current Active Configuration (models_config.json)

{
  "patient_summary_models": [
    {
      "name": "microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf",
      "type": "gguf",
      "is_active": true,  // ← PRIMARY MODEL
      "cached": true,
      "description": "Phi-3 Mini GGUF Q4 quantized - PRIMARY MODEL",
      "use_case": "Fast patient summary generation with CPU/GPU",
      "repo_id": "microsoft/Phi-3-mini-4k-instruct-gguf",
      "filename": "Phi-3-mini-4k-instruct-q4.gguf"
    }
  ]
}

Performance Comparison

Model Medical Text Quality Speed Memory Instruction Following
Phi-3 GGUF Q4 ⭐⭐⭐⭐⭐ Excellent Fast Low βœ… Yes
FLAN-T5 Large ⭐⭐⭐⭐ Good Very Fast Low βœ… Yes
Longformer ⭐ Poor (Irrelevant) Slow High ❌ No
BART-CNN ⭐⭐ Poor Medium Medium ❌ No

FAQs

Q: Can I still use Longformer/BART? A: Technically yes (they're still cached), but strongly not recommended. They will produce irrelevant summaries.

Q: Why are these models still in the config? A: For backward compatibility and documentation. They're marked as deprecated and is_active: false.

Q: What if Phi-3 is too slow? A: Try google/flan-t5-large as an alternative. Still instruction-tuned but smaller/faster.

Q: Can you fix Longformer to work with medical text? A: No. The model's training is fundamentally incompatible. Would require retraining on medical data.


Summary

βœ… DO USE: Phi-3-mini-4k-instruct-q4.gguf (default/recommended) βœ… ALTERNATIVE: google/flan-t5-large
⚠️ AVOID: facebook/bart-large-cnn ❌ DO NOT USE: patrickvonplaten/longformer2roberta-cnn_dailymail-fp16

The Longformer model's irrelevant summaries are due to fundamental training mismatch with medical domain, not a bug that can be fixed.