sml-agents-publish-subscribe

Sleeping

App Files Files Community

sml-agents-publish-subscribe / NER_AGENTS_GUIDE.md

santanche

refactor (model): replacing phi3 by phi4-mini

f369a7d 3 months ago

preview code

raw

history blame contribute delete

12.6 kB

Named Entity Recognition (NER) Agents Guide

Overview

The Pub/Sub Multi-Agent System now includes specialized NER (Named Entity Recognition) agents powered by HuggingFace Transformers. These agents use pre-trained BERT models to extract medical entities from text and work differently from regular LLM agents.

Technical Implementation

NER agents use the HuggingFace transformers library:

from transformers import pipeline

ner_pipeline = pipeline(
    "ner",
    model="samrawal/bert-base-uncased_clinical-ner",
    aggregation_strategy="simple"
)

# Process text
entities = ner_pipeline("Patient has diabetes")

Key differences from LLM agents:

Use transformers pipelines, not Ollama
Models are downloaded on first use from HuggingFace
Processing is deterministic (no temperature/sampling)
Faster inference than LLM-based extraction

Available NER Models

1. Clinical NER Model

Model: samrawal/bert-base-uncased_clinical-ner

Purpose: Extract clinical entities from medical text

Recognized Entity Types:

PROBLEM: Diseases, conditions, symptoms
TREATMENT: Medications, procedures, therapies
TEST: Diagnostic tests, lab results
OCCURRENCE: Medical events, admissions

Best for:

Clinical notes
Patient reports
Medical records
Symptom descriptions

2. Anatomy Detection Model

Model: OpenMed/OpenMed-NER-AnatomyDetect-BioPatient-108M

Purpose: Detect anatomical structures and patient information

Recognized Entity Types:

ANATOMY: Body parts, organs, anatomical structures
PATIENT: Patient demographics, identifiers
BIOMARKER: Biological markers
CLINICAL_FINDING: Clinical observations

Best for:

Anatomical descriptions
Radiology reports
Surgical notes
Physical examination records

How NER Agents Work

Processing Flow

NER agents work the same way as LLM agents regarding prompt processing, but differ in what they do with the rendered prompt:

1. Prompt Rendering (Same as LLM agents):

Agent Prompt: "Patient information: {PatientNote}"

Rendered: "Patient information: Patient has diabetes and takes metformin"

2. Processing Difference:

LLM Agents:

Rendered prompt → Send to Ollama → Generate response

NER Agents:

Rendered prompt → IS the text to analyze → Extract entities

Example

Agent Configuration:

Title: Clinical Entity Extractor
Model: samrawal/bert-base-uncased_clinical-ner
Prompt: Clinical note:
{PatientNote}

Extract entities from the note above.

What happens:

System renders prompt → Replaces {PatientNote} with actual text
Rendered text: "Clinical note:\nPatient has diabetes\n\nExtract entities from the note above."
NER processes entire rendered text (not just the data source)
Entities found: "diabetes" as PROBLEM

Key Point: The prompt template itself becomes part of the analyzed text!

Special Behavior

Unified Prompt System: NER and LLM agents use the same prompt rendering
Text Analysis: The rendered prompt is the text NER analyzes
Dual Output:
- JSON result (for chaining to other agents)
- Formatted display (for human reading)
Dedicated Display: NER Result box shows entities inline with text

Design Philosophy

Why NER agents use rendered prompts as analysis text:

Consistency: All agents render prompts the same way
Flexibility: Can combine multiple data sources in prompt
Context: Can add instructions or context around the text
Composability: Text from previous agents can be analyzed

Example Use Cases:

Use Case 1: Direct data source
Prompt: {PatientNote}
→ Analyzes just the patient note

Use Case 2: With context
Prompt: Medical History: {History}
Current Symptoms: {Symptoms}
→ Analyzes both sections with labels

Use Case 3: From previous agent
Prompt: {input}
→ Analyzes output from previous agent

Use Case 4: Combined
Prompt: Patient: {question}
Previous Analysis: {input}
→ Analyzes both user question and previous results

Using NER Agents

Basic Setup

Agent Configuration:

Title: Clinical Entity Extractor
Model: samrawal/bert-base-uncased_clinical-ner
Prompt: {PatientNote}
Subscribe Topic: TEXT_TO_ANALYZE
Publish Topic: ENTITIES_FOUND
☑ Show result in Final Result box

What happens:

Agent receives message from TEXT_TO_ANALYZE topic
Renders prompt: {PatientNote} → actual patient note text
Runs NER pipeline on the rendered text
Extracts entities automatically
Publishes JSON to ENTITIES_FOUND topic
Shows JSON in Final Result box
Shows formatted text in NER Result box

Important: The entire rendered prompt is analyzed, not just individual placeholders.

Output Format

JSON Output (in Final Result box):

[
  {
    "text": "diabetes",
    "entity_type": "PROBLEM",
    "start": 45,
    "end": 53,
    "score": 0.9987
  },
  {
    "text": "metformin",
    "entity_type": "TREATMENT",
    "start": 78,
    "end": 87,
    "score": 0.9923
  }
]

Formatted Output (in NER Result box):

Patient reports history of [diabetes:PROBLEM] and is taking [metformin:TREATMENT].

Note: The score field (0.0-1.0) indicates the model's confidence in the entity classification.

Example Workflows

Example 1: Clinical Note Analysis

Data Source:

Label: ClinicalNote
Content:

Patient presents with chest pain and shortness of breath.
History of hypertension and diabetes mellitus type 2.
Currently taking lisinopril 10mg daily and metformin 500mg twice daily.
ECG shows ST elevation. Troponin levels elevated at 0.5 ng/mL.

Agents:

Agent 1: Clinical NER

Title: Extract Clinical Entities
Model: samrawal/bert-base-uncased_clinical-ner
Subscribe: START
Publish: CLINICAL_ENTITIES
Prompt: {ClinicalNote} (text to analyze)
☑ Show result

Agent 2: Entity Summarizer

Title: Summarize Findings
Model: phi4-mini
Subscribe: CLINICAL_ENTITIES
Publish: (empty)
Prompt:

Based on these extracted entities:
{input}

Summarize the key clinical findings:
1. Problems identified
2. Treatments mentioned
3. Tests performed

☑ Show result

Expected Results:

NER Result box:

Patient presents with [chest pain:PROBLEM] and [shortness of breath:PROBLEM].
History of [hypertension:PROBLEM] and [diabetes mellitus type 2:PROBLEM].
Currently taking [lisinopril:TREATMENT] 10mg daily and [metformin:TREATMENT] 500mg twice daily.
[ECG:TEST] shows ST elevation. [Troponin:TEST] levels elevated at 0.5 ng/mL.

Final Result box:

--- Extract Clinical Entities ---
[{"text": "chest pain", "entity_type": "PROBLEM", ...}, ...]

--- Summarize Findings ---
Key Clinical Findings:
1. Problems: chest pain, shortness of breath, hypertension, diabetes
2. Treatments: lisinopril, metformin
3. Tests: ECG, Troponin

Example 2: Anatomy Detection in Radiology Report

User Question: "Analyze this radiology report"

Data Source:

Label: RadiologyReport
Content:

CT scan of the chest reveals mass in right upper lobe measuring 3.2 cm.
No evidence of mediastinal lymphadenopathy.
Heart size is normal. Lungs are clear bilaterally.
Liver and spleen appear unremarkable.

Agent Configuration:

Agent 1: Anatomy Detector

Title: Detect Anatomical Structures
Model: OpenMed/OpenMed-NER-AnatomyDetect-BioPatient-108M
Subscribe: START
Publish: ANATOMY_FOUND
Prompt: {RadiologyReport}
☑ Show result

Expected NER Result:

CT scan of the [chest:ANATOMY] reveals mass in [right upper lobe:ANATOMY] measuring 3.2 cm.
No evidence of [mediastinal:ANATOMY] lymphadenopathy.
[Heart:ANATOMY] size is normal. [Lungs:ANATOMY] are clear bilaterally.
[Liver:ANATOMY] and [spleen:ANATOMY] appear unremarkable.

Example 3: Multi-Stage Medical Analysis

Workflow: Extract entities → Categorize → Generate report

Agent 1: Entity Extraction

Model: samrawal/bert-base-uncased_clinical-ner
Subscribe: START
Publish: ENTITIES
☑ Show result

Agent 2: Entity Categorization

Model: phi4-mini
Subscribe: ENTITIES
Publish: CATEGORIZED
Prompt:

Categorize these medical entities by type:
{input}

Group by: Problems, Treatments, Tests

☑ Show result

Agent 3: Report Generator

Model: MedAIBase/MedGemma1.5:4b
Subscribe: CATEGORIZED
Publish: (empty)
Prompt:

Generate a structured clinical summary based on:
{input}

Include assessment and plan.

☑ Show result

NER Result Display Features

Inline Entity Markup

Entities are displayed inline with brackets and labels:

[entity text:ENTITY_TYPE]

Color Coding (Future Enhancement)

Different entity types could be color-coded:

Problems: Red
Treatments: Blue
Tests: Green
Anatomy: Purple

Entity Statistics (Future Enhancement)

Could show count of each entity type found.

Best Practices

1. Choosing the Right NER Model

Use Clinical NER for:

General clinical text
Patient complaints
Medical history
Treatment plans

Use Anatomy NER for:

Radiology reports
Surgical notes
Physical examination
Anatomical descriptions

2. Crafting Effective Prompts

Simple (Direct Analysis):

Prompt: {PatientNote}

Analyzes just the patient note.

With Structure:

Prompt: Chief Complaint: {Complaint}
Medical History: {History}
Current Medications: {Medications}

Analyzes all sections with clear labels.

With Context:

Prompt: Patient Case Summary:
{input}

The above text contains medical information.

Adds context (though NER will analyze the entire text).

From Previous Agent:

Prompt: {input}

Analyzes output from previous agent in chain.

3. Combining NER with Other Agents

Pattern: Extract → Analyze → Report

NER Agent → Regular LLM → Medical LLM

Example:

NER extracts entities from clinical note
phi4-mini categorizes entities by type
MedGemma generates clinical assessment

4. Understanding What Gets Analyzed

Remember: The ENTIRE rendered prompt is analyzed.

Example:

Prompt: Patient has {condition} and takes {medication}.

If {condition} = "diabetes" and {medication} = "metformin":

Rendered text analyzed:

Patient has diabetes and takes metformin.

Entities found: "diabetes" (PROBLEM), "metformin" (TREATMENT)

The sentence structure matters! NER sees the full context.

Limitations

Current Limitations

No Prompt Templates: NER agents don't support custom prompts
Fixed Entity Types: Each model has predefined entity types
English Only: Models trained on English medical text
Context Window: Limited input text size

Workarounds

For Long Texts:

Split into chunks
Process separately
Combine results

For Custom Entities:

Use regular LLM with custom prompt
Post-process NER output with another agent

Troubleshooting

Issue: No entities detected

Causes:

Text doesn't contain medical terms
Wrong NER model for the content type
Text too short or too long

Solutions:

Verify text contains medical content
Try different NER model
Check text length

Issue: Entities in wrong category

Cause: Model misclassification

Solution: Use post-processing agent to reclassify

Issue: NER Result box empty

Causes:

"Show result" not checked
Agent failed to execute
No entities found

Solutions:

Check "Show result" checkbox
Review Execution Log for errors
Verify input text

Advanced Usage

Combining Multiple NER Models

Run both NER models on same text:

Agent 1: Clinical NER

Subscribe: START
Publish: CLINICAL_ENTITIES

Agent 2: Anatomy NER

Subscribe: START
Publish: ANATOMY_ENTITIES

Agent 3: Merge Results

Subscribe: CLINICAL_ENTITIES, ANATOMY_ENTITIES
Combine both outputs

Entity Validation

Add validation agent after NER:

Agent 1: NER Extraction

Model: Clinical NER
Publish: RAW_ENTITIES

Agent 2: Entity Validator

Model: MedGemma
Subscribe: RAW_ENTITIES
Validate medical accuracy
Publish: VALIDATED_ENTITIES

Future Enhancements

Planned features:

Color-coded entity display
Entity statistics dashboard
Confidence scores
Custom entity types
Multi-language support
Entity linking (to medical ontologies)
Batch processing