DeepCritical / docs /bugs /P0_MAGENTIC_AND_SEARCH_AUDIT.md
VibecoderMcSwaggins's picture
refactor(tools): replace BioRxiv with Europe PMC (Phase 01)
2f8ae1f
|
raw
history blame
6.68 kB
# P0 Audit: Microsoft Agent Framework (Magentic) & Search Tools
**Date:** November 27, 2025
**Auditor:** Claude Code
**Status:** VERIFIED
---
## TL;DR
| Component | Status | Verdict |
|-----------|--------|---------|
| Microsoft Agent Framework | βœ… WORKING | Correctly wired, no bugs |
| GPT-5.1 Model Config | βœ… CORRECT | Using `gpt-5.1` as configured |
| Search Tools | ❌ BROKEN | Root cause of garbage results |
**The orchestration framework is fine. The search layer is garbage.**
---
## Microsoft Agent Framework Verification
### Import Test: PASSED
```python
from agent_framework import MagenticBuilder, ChatAgent
from agent_framework.openai import OpenAIChatClient
# All imports successful
```
### Agent Creation Test: PASSED
```python
from src.agents.magentic_agents import create_search_agent
search_agent = create_search_agent()
# SearchAgent created: SearchAgent
# Description: Searches biomedical databases (PubMed, ClinicalTrials.gov, bioRxiv)
```
### Workflow Build Test: PASSED
```python
from src.orchestrator_magentic import MagenticOrchestrator
orchestrator = MagenticOrchestrator(max_rounds=2)
workflow = orchestrator._build_workflow()
# Workflow built successfully: <class 'agent_framework._workflows._workflow.Workflow'>
```
### Model Configuration: CORRECT
```python
settings.openai_model = "gpt-5.1" # βœ… Using GPT-5.1, not GPT-4o
settings.openai_api_key = True # βœ… API key is set
```
---
## What Magentic Provides (Working)
1. **Multi-Agent Coordination**
- Manager agent orchestrates SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent
- Uses `MagenticBuilder().with_standard_manager()` for coordination
2. **ChatAgent Pattern**
- Each agent has internal LLM (GPT-5.1)
- Can call tools via `@ai_function` decorator
- Has proper instructions for domain-specific tasks
3. **Workflow Streaming**
- Events: `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, etc.
- Real-time UI updates via `workflow.run_stream(task)`
4. **State Management**
- `MagenticState` persists evidence across agents
- `get_bibliography()` tool for ReportAgent
---
## What's Actually Broken: The Search Tools
### File: `src/agents/tools.py`
The Magentic agents call these tools:
- `search_pubmed` β†’ Uses `PubMedTool`
- `search_clinical_trials` β†’ Uses `ClinicalTrialsTool`
- `search_preprints` β†’ Uses `BioRxivTool`
**These tools are the problem, not the framework.**
---
## Search Tool Bugs (Detailed)
### BUG 1: BioRxiv API Does Not Support Search
**File:** `src/tools/biorxiv.py:248-286`
```python
# This fetches the FIRST 100 papers from the last 90 days
# It does NOT search by keyword - the API doesn't support that
url = f"{self.BASE_URL}/{self.server}/{interval}/0/json"
# Then filters client-side for keywords
matching = self._filter_by_keywords(papers, query_terms, max_results)
```
**Problem:**
- Fetches 100 random chronological papers
- Filters for ANY keyword match in title/abstract
- "Long COVID medications" returns papers about "calf muscles" because they mention "COVID" once
**Fix:** Remove BioRxiv or use Europe PMC (which has actual search)
---
### BUG 2: PubMed Query Not Optimized
**File:** `src/tools/pubmed.py:54-71`
```python
search_params = self._build_params(
db="pubmed",
term=query, # RAW USER QUERY - no preprocessing!
retmax=max_results,
sort="relevance",
)
```
**Problem:**
- User enters: "What medications show promise for Long COVID?"
- PubMed receives: `What medications show promise for Long COVID?`
- Should receive: `("long covid"[Title/Abstract] OR "PASC"[Title/Abstract]) AND (treatment[Title/Abstract] OR drug[Title/Abstract])`
**Fix:** Add query preprocessing:
1. Strip question words (what, which, how, etc.)
2. Expand medical synonyms (Long COVID β†’ PASC, Post-COVID)
3. Use MeSH terms for better recall
---
### BUG 3: ClinicalTrials.gov No Filtering
**File:** `src/tools/clinicaltrials.py`
Returns ALL trials including:
- Withdrawn trials
- Terminated trials
- Observational studies (not drug interventions)
- Phase 1 (no efficacy data)
**Fix:** Filter by:
- `studyType=INTERVENTIONAL`
- `phase=PHASE2,PHASE3,PHASE4`
- `status=COMPLETED,ACTIVE_NOT_RECRUITING,RECRUITING`
---
## Evidence: Garbage In β†’ Garbage Out
When the Magentic SearchAgent calls these tools:
```
SearchAgent: "Find evidence for Long COVID medications"
β”‚
β–Ό
search_pubmed("Long COVID medications")
β†’ Returns 1 semi-relevant paper (raw query hits)
search_preprints("Long COVID medications")
β†’ Returns garbage (BioRxiv API doesn't search)
β†’ "Calf muscle adaptations" (has "COVID" somewhere)
β†’ "Ophthalmologist work-life balance" (mentions COVID)
search_clinical_trials("Long COVID medications")
β†’ Returns all trials, no filtering
β”‚
β–Ό
JudgeAgent receives garbage evidence
β”‚
β–Ό
HypothesisAgent can't generate good hypotheses from garbage
β”‚
β–Ό
ReportAgent produces garbage report
```
**The framework is doing its job. It's orchestrating agents correctly. But the agents are being fed garbage data.**
---
## Recommended Fixes
### Priority 1: Delete or Fix BioRxiv (30 min)
**Option A: Delete it**
```python
# In src/agents/tools.py, remove:
# from src.tools.biorxiv import BioRxivTool
# _biorxiv = BioRxivTool()
# @ai_function search_preprints(...)
```
**Option B: Replace with Europe PMC**
Europe PMC has preprints AND proper search API:
```
https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=long+covid+treatment&format=json
```
### Priority 2: Fix PubMed Query (1 hour)
Add query preprocessor:
```python
def preprocess_query(raw_query: str) -> str:
"""Convert natural language to PubMed query syntax."""
# Strip question words
# Expand medical synonyms
# Add field tags [Title/Abstract]
# Return optimized query
```
### Priority 3: Filter ClinicalTrials (30 min)
Add parameters to API call:
```python
params = {
"query.term": query,
"filter.overallStatus": "COMPLETED,RECRUITING",
"filter.studyType": "INTERVENTIONAL",
"pageSize": max_results,
}
```
---
## Conclusion
**Microsoft Agent Framework: NO BUGS FOUND**
- Imports work βœ…
- Agent creation works βœ…
- Workflow building works βœ…
- Model config correct (GPT-5.1) βœ…
- Streaming events work βœ…
**Search Tools: CRITICALLY BROKEN**
- BioRxiv: API doesn't support search (fundamental)
- PubMed: No query optimization (fixable)
- ClinicalTrials: No filtering (fixable)
**Recommendation:**
1. Delete BioRxiv immediately (unusable)
2. Add PubMed query preprocessing
3. Add ClinicalTrials filtering
4. Then the Magentic multi-agent system will work as designed