HNTAI / docs /MODEL_FIX_BART_LONGFORMER.md
sachinchandrankallar's picture
changes for publishing the latest including generate_generic api
4156c57

Fix: BART and Longformer2Roberta Summarization Models

Issue Description

The facebook/bart-large-cnn and patrickvonplaten/longformer2roberta-cnn_dailymail-fp16 models were producing inaccurate or "rubbish" summaries.

Root Cause

These models are encoder-decoder summarization models trained on the CNN/DailyMail dataset. They are NOT instruction-tuned models.

Key Distinction:

Instruction-tuned models (like Phi-3, FLAN-T5, GPT models):

  • Understand and follow instructions like "Generate a summary based on..."
  • Can handle complex prompts with multiple directives
  • Trained on instruction-following datasets

Non-instruction-tuned summarization models (like BART, Longformer2Roberta):

  • Trained on simple article β†’ summary tasks
  • Do NOT understand instructions
  • Only trained to condense/extract key information from raw text
  • When given instructions, they try to summarize the instruction itself instead of following it

The Problem

Previously, these models were receiving prompts like:

Patient Visit Data: [data]

Baseline: [baseline]

Changes: [delta_text]

Generate a comprehensive patient summary based on the above information.

The models would try to summarize this instruction text rather than follow it, resulting in nonsensical output.

The Solution

Modified the build_summarization_context() function in routes_fastapi.py to:

  1. Detect non-instruction-tuned models (BART, Longformer2Roberta)
  2. Send ONLY raw text to these models without any instructions
  3. Structure the data with simple labels (like section headers in an article)

Before (Incorrect):

prompt = f"Patient Data:\nBaseline: {baseline}\nChanges: {delta_text}\n\n" \
         f"Generate a comprehensive patient summary based on the above information."

After (Correct):

# For BART/Longformer - NO instructions, just data
prompt = f"Patient Information and Visit History:\n{visit_data}\n" \
         f"\nBaseline Status:\n{baseline}\n" \
         f"\nRecent Changes and Updates:\n{delta_text}"

Implementation Details

Modified Files:

  1. services/ai-service/src/ai_med_extract/api/routes_fastapi.py

    • Updated build_summarization_context() function
    • Added model detection logic
    • Updated all function calls to pass model_name parameter
  2. models_config.json

    • Added notes about these models being non-instruction-tuned
    • Clarified their proper usage

Code Changes:

def build_summarization_context(custom_prompt, visit_data_text, baseline, delta_text, model_name=None):
    """
    Build context for summarization models.
    
    Non-instruction-tuned models (BART, Longformer2Roberta) need ONLY raw text to summarize,
    without any instructions. They were trained on article->summary tasks, not instruction following.
    """
    # List of models that are NOT instruction-tuned
    NON_INSTRUCTION_MODELS = [
        "facebook/bart-large-cnn",
        "patrickvonplaten/longformer2roberta-cnn_dailymail-fp16"
    ]
    
    # Check if this is a non-instruction-tuned model
    is_non_instruction_model = model_name and any(m in model_name for m in NON_INSTRUCTION_MODELS)
    
    if is_non_instruction_model:
        # For non-instruction models: Send ONLY the data to be summarized
        # Structure it like an article with section headers
        data_text = f"Patient Information and Visit History:\n{visit_data_text}\n"
        if baseline:
            data_text += f"\nBaseline Status:\n{baseline}\n"
        if delta_text:
            data_text += f"\nRecent Changes and Updates:\n{delta_text}"
        return data_text.strip()
    else:
        # For instruction-tuned models: Include explicit instructions
        return f"{custom_prompt}\n\nPatient Visit Data:\n{visit_data_text}\n\n" \
               f"Baseline: {baseline}\n\nChanges: {delta_text}\n\n" \
               f"Generate a comprehensive patient summary based on the above information."

Expected Results

After this fix:

βœ… BART and Longformer2Roberta models now receive properly formatted input βœ… Models will extract and condense key information (their intended purpose) βœ… Output should be coherent summaries rather than garbled text βœ… No changes to instruction-tuned models (Phi-3, FLAN-T5, etc.)

Model Comparison

Model Type Instruction-Tuned? Best For
facebook/bart-large-cnn Summarization ❌ No Extracting key points from documents
patrickvonplaten/longformer2roberta-cnn_dailymail-fp16 Seq2Seq ❌ No Long document summarization (4096+ tokens)
google/flan-t5-large Summarization βœ… Yes Instruction-following summarization
microsoft/Phi-3-mini-4k-instruct-gguf Text Generation βœ… Yes Complex patient summaries with instructions

Recommendations

For Best Results:

  1. Use instruction-tuned models (Phi-3, FLAN-T5) for patient summaries

    • They understand medical context better
    • Can follow specific formatting requirements
    • Handle complex multi-step instructions
  2. Use BART/Longformer for simple extraction tasks

    • Quick key point extraction
    • Document length reduction
    • When you just need "the highlights"
  3. Current PRIMARY model (Phi-3 GGUF) is already optimal

    • Instruction-tuned
    • Quantized for efficiency
    • Best quality for patient summaries

Testing

To test the fix:

# Test with BART
curl -X POST http://localhost:8000/api/patient_summary \
  -H "Content-Type: application/json" \
  -d '{
    "patient_info": {...},
    "model_name": "facebook/bart-large-cnn",
    "model_type": "summarization"
  }'

# Test with Longformer
curl -X POST http://localhost:8000/api/patient_summary \
  -H "Content-Type: application/json" \
  -d '{
    "patient_info": {...},
    "model_name": "patrickvonplaten/longformer2roberta-cnn_dailymail-fp16",
    "model_type": "seq2seq"
  }'

Future Considerations

If adding new models, check if they're instruction-tuned:

Instruction-tuned models typically have:

  • "instruct" in the model name
  • "chat" in the model name
  • "flan" prefix (FLAN-T5, etc.)
  • Trained on datasets like: InstructGPT, Flan, Alpaca, etc.

Non-instruction-tuned models:

  • Trained on simple task datasets (CNN/DailyMail, XSum, etc.)
  • Base models without fine-tuning
  • Should receive raw text only

References


Date: 2025-11-07 Status: βœ… Fixed Impact: Medium - Affects BART and Longformer model quality Backward Compatibility: βœ… Yes - No breaking changes to API