Spaces:

salvinjose
/

HNTAI

Paused

Trained on simple article → summary tasks
Do NOT understand instructions
Only trained to condense/extract key information from raw text
When given instructions, they try to summarize the instruction itself instead of following it

The Problem

Previously, these models were receiving prompts like:

Patient Visit Data: [data]

Baseline: [baseline]

Changes: [delta_text]

Generate a comprehensive patient summary based on the above information.

The models would try to summarize this instruction text rather than follow it, resulting in nonsensical output.

The Solution

Modified the build_summarization_context() function in routes_fastapi.py to:

Detect non-instruction-tuned models (BART, Longformer2Roberta)
Send ONLY raw text to these models without any instructions
Structure the data with simple labels (like section headers in an article)

Before (Incorrect):

prompt = f"Patient Data:\nBaseline: {baseline}\nChanges: {delta_text}\n\n" \
         f"Generate a comprehensive patient summary based on the above information."

After (Correct):

# For BART/Longformer - NO instructions, just data
prompt = f"Patient Information and Visit History:\n{visit_data}\n" \
         f"\nBaseline Status:\n{baseline}\n" \
         f"\nRecent Changes and Updates:\n{delta_text}"

Implementation Details

Modified Files:

services/ai-service/src/ai_med_extract/api/routes_fastapi.py
- Updated build_summarization_context() function
- Added model detection logic
- Updated all function calls to pass model_name parameter
models_config.json
- Added notes about these models being non-instruction-tuned
- Clarified their proper usage

Code Changes:

def build_summarization_context(custom_prompt, visit_data_text, baseline, delta_text, model_name=None):
    """
    Build context for summarization models.
    
    Non-instruction-tuned models (BART, Longformer2Roberta) need ONLY raw text to summarize,
    without any instructions. They were trained on article->summary tasks, not instruction following.
    """
    # List of models that are NOT instruction-tuned
    NON_INSTRUCTION_MODELS = [
        "facebook/bart-large-cnn",
        "patrickvonplaten/longformer2roberta-cnn_dailymail-fp16"
    ]
    
    # Check if this is a non-instruction-tuned model
    is_non_instruction_model = model_name and any(m in model_name for m in NON_INSTRUCTION_MODELS)
    
    if is_non_instruction_model:
        # For non-instruction models: Send ONLY the data to be summarized
        # Structure it like an article with section headers
        data_text = f"Patient Information and Visit History:\n{visit_data_text}\n"
        if baseline:
            data_text += f"\nBaseline Status:\n{baseline}\n"
        if delta_text:
            data_text += f"\nRecent Changes and Updates:\n{delta_text}"
        return data_text.strip()
    else:
        # For instruction-tuned models: Include explicit instructions
        return f"{custom_prompt}\n\nPatient Visit Data:\n{visit_data_text}\n\n" \
               f"Baseline: {baseline}\n\nChanges: {delta_text}\n\n" \
               f"Generate a comprehensive patient summary based on the above information."

Expected Results

After this fix:

✅ BART and Longformer2Roberta models now receive properly formatted input ✅ Models will extract and condense key information (their intended purpose) ✅ Output should be coherent summaries rather than garbled text ✅ No changes to instruction-tuned models (Phi-3, FLAN-T5, etc.)

Model Comparison

Model	Type	Instruction-Tuned?	Best For
`facebook/bart-large-cnn`	Summarization	❌ No	Extracting key points from documents
`patrickvonplaten/longformer2roberta-cnn_dailymail-fp16`	Seq2Seq	❌ No	Long document summarization (4096+ tokens)
`google/flan-t5-large`	Summarization	✅ Yes	Instruction-following summarization
`microsoft/Phi-3-mini-4k-instruct-gguf`	Text Generation	✅ Yes	Complex patient summaries with instructions

Recommendations

For Best Results:

Use instruction-tuned models (Phi-3, FLAN-T5) for patient summaries
- They understand medical context better
- Can follow specific formatting requirements
- Handle complex multi-step instructions
Use BART/Longformer for simple extraction tasks
- Quick key point extraction
- Document length reduction
- When you just need "the highlights"
Current PRIMARY model (Phi-3 GGUF) is already optimal
- Instruction-tuned
- Quantized for efficiency
- Best quality for patient summaries

Testing

To test the fix:

# Test with BART
curl -X POST http://localhost:8000/api/patient_summary \
  -H "Content-Type: application/json" \
  -d '{
    "patient_info": {...},
    "model_name": "facebook/bart-large-cnn",
    "model_type": "summarization"
  }'

# Test with Longformer
curl -X POST http://localhost:8000/api/patient_summary \
  -H "Content-Type: application/json" \
  -d '{
    "patient_info": {...},
    "model_name": "patrickvonplaten/longformer2roberta-cnn_dailymail-fp16",
    "model_type": "seq2seq"
  }'

Future Considerations

If adding new models, check if they're instruction-tuned:

Instruction-tuned models typically have:

"instruct" in the model name
"chat" in the model name
"flan" prefix (FLAN-T5, etc.)
Trained on datasets like: InstructGPT, Flan, Alpaca, etc.

Non-instruction-tuned models:

Trained on simple task datasets (CNN/DailyMail, XSum, etc.)
Base models without fine-tuning
Should receive raw text only

References

BART Paper: https://arxiv.org/abs/1910.13461
CNN/DailyMail Dataset: https://arxiv.org/abs/1506.03340
Longformer Paper: https://arxiv.org/abs/2004.05150
HuggingFace Model Cards:
- https://huggingface.co/facebook/bart-large-cnn
- https://huggingface.co/patrickvonplaten/longformer2roberta-cnn_dailymail-fp16

Date: 2025-11-07 Status: ✅ Fixed Impact: Medium - Affects BART and Longformer model quality Backward Compatibility: ✅ Yes - No breaking changes to API