Spaces:
Paused
Fix: BART and Longformer2Roberta Summarization Models
Issue Description
The facebook/bart-large-cnn and patrickvonplaten/longformer2roberta-cnn_dailymail-fp16 models were producing inaccurate or "rubbish" summaries.
Root Cause
These models are encoder-decoder summarization models trained on the CNN/DailyMail dataset. They are NOT instruction-tuned models.
Key Distinction:
Instruction-tuned models (like Phi-3, FLAN-T5, GPT models):
- Understand and follow instructions like "Generate a summary based on..."
- Can handle complex prompts with multiple directives
- Trained on instruction-following datasets
Non-instruction-tuned summarization models (like BART, Longformer2Roberta):
- Trained on simple article β summary tasks
- Do NOT understand instructions
- Only trained to condense/extract key information from raw text
- When given instructions, they try to summarize the instruction itself instead of following it
The Problem
Previously, these models were receiving prompts like:
Patient Visit Data: [data]
Baseline: [baseline]
Changes: [delta_text]
Generate a comprehensive patient summary based on the above information.
The models would try to summarize this instruction text rather than follow it, resulting in nonsensical output.
The Solution
Modified the build_summarization_context() function in routes_fastapi.py to:
- Detect non-instruction-tuned models (BART, Longformer2Roberta)
- Send ONLY raw text to these models without any instructions
- Structure the data with simple labels (like section headers in an article)
Before (Incorrect):
prompt = f"Patient Data:\nBaseline: {baseline}\nChanges: {delta_text}\n\n" \
f"Generate a comprehensive patient summary based on the above information."
After (Correct):
# For BART/Longformer - NO instructions, just data
prompt = f"Patient Information and Visit History:\n{visit_data}\n" \
f"\nBaseline Status:\n{baseline}\n" \
f"\nRecent Changes and Updates:\n{delta_text}"
Implementation Details
Modified Files:
services/ai-service/src/ai_med_extract/api/routes_fastapi.py- Updated
build_summarization_context()function - Added model detection logic
- Updated all function calls to pass
model_nameparameter
- Updated
models_config.json- Added notes about these models being non-instruction-tuned
- Clarified their proper usage
Code Changes:
def build_summarization_context(custom_prompt, visit_data_text, baseline, delta_text, model_name=None):
"""
Build context for summarization models.
Non-instruction-tuned models (BART, Longformer2Roberta) need ONLY raw text to summarize,
without any instructions. They were trained on article->summary tasks, not instruction following.
"""
# List of models that are NOT instruction-tuned
NON_INSTRUCTION_MODELS = [
"facebook/bart-large-cnn",
"patrickvonplaten/longformer2roberta-cnn_dailymail-fp16"
]
# Check if this is a non-instruction-tuned model
is_non_instruction_model = model_name and any(m in model_name for m in NON_INSTRUCTION_MODELS)
if is_non_instruction_model:
# For non-instruction models: Send ONLY the data to be summarized
# Structure it like an article with section headers
data_text = f"Patient Information and Visit History:\n{visit_data_text}\n"
if baseline:
data_text += f"\nBaseline Status:\n{baseline}\n"
if delta_text:
data_text += f"\nRecent Changes and Updates:\n{delta_text}"
return data_text.strip()
else:
# For instruction-tuned models: Include explicit instructions
return f"{custom_prompt}\n\nPatient Visit Data:\n{visit_data_text}\n\n" \
f"Baseline: {baseline}\n\nChanges: {delta_text}\n\n" \
f"Generate a comprehensive patient summary based on the above information."
Expected Results
After this fix:
β BART and Longformer2Roberta models now receive properly formatted input β Models will extract and condense key information (their intended purpose) β Output should be coherent summaries rather than garbled text β No changes to instruction-tuned models (Phi-3, FLAN-T5, etc.)
Model Comparison
| Model | Type | Instruction-Tuned? | Best For |
|---|---|---|---|
facebook/bart-large-cnn |
Summarization | β No | Extracting key points from documents |
patrickvonplaten/longformer2roberta-cnn_dailymail-fp16 |
Seq2Seq | β No | Long document summarization (4096+ tokens) |
google/flan-t5-large |
Summarization | β Yes | Instruction-following summarization |
microsoft/Phi-3-mini-4k-instruct-gguf |
Text Generation | β Yes | Complex patient summaries with instructions |
Recommendations
For Best Results:
Use instruction-tuned models (Phi-3, FLAN-T5) for patient summaries
- They understand medical context better
- Can follow specific formatting requirements
- Handle complex multi-step instructions
Use BART/Longformer for simple extraction tasks
- Quick key point extraction
- Document length reduction
- When you just need "the highlights"
Current PRIMARY model (
Phi-3 GGUF) is already optimal- Instruction-tuned
- Quantized for efficiency
- Best quality for patient summaries
Testing
To test the fix:
# Test with BART
curl -X POST http://localhost:8000/api/patient_summary \
-H "Content-Type: application/json" \
-d '{
"patient_info": {...},
"model_name": "facebook/bart-large-cnn",
"model_type": "summarization"
}'
# Test with Longformer
curl -X POST http://localhost:8000/api/patient_summary \
-H "Content-Type: application/json" \
-d '{
"patient_info": {...},
"model_name": "patrickvonplaten/longformer2roberta-cnn_dailymail-fp16",
"model_type": "seq2seq"
}'
Future Considerations
If adding new models, check if they're instruction-tuned:
Instruction-tuned models typically have:
- "instruct" in the model name
- "chat" in the model name
- "flan" prefix (FLAN-T5, etc.)
- Trained on datasets like: InstructGPT, Flan, Alpaca, etc.
Non-instruction-tuned models:
- Trained on simple task datasets (CNN/DailyMail, XSum, etc.)
- Base models without fine-tuning
- Should receive raw text only
References
- BART Paper: https://arxiv.org/abs/1910.13461
- CNN/DailyMail Dataset: https://arxiv.org/abs/1506.03340
- Longformer Paper: https://arxiv.org/abs/2004.05150
- HuggingFace Model Cards:
Date: 2025-11-07 Status: β Fixed Impact: Medium - Affects BART and Longformer model quality Backward Compatibility: β Yes - No breaking changes to API