Spaces:
Running
Running
| # P0 Audit: Microsoft Agent Framework (Magentic) & Search Tools | |
| **Date:** November 27, 2025 | |
| **Auditor:** Claude Code | |
| **Status:** VERIFIED | |
| --- | |
| ## TL;DR | |
| | Component | Status | Verdict | | |
| |-----------|--------|---------| | |
| | Microsoft Agent Framework | β WORKING | Correctly wired, no bugs | | |
| | GPT-5.1 Model Config | β CORRECT | Using `gpt-5.1` as configured | | |
| | Search Tools | β BROKEN | Root cause of garbage results | | |
| **The orchestration framework is fine. The search layer is garbage.** | |
| --- | |
| ## Microsoft Agent Framework Verification | |
| ### Import Test: PASSED | |
| ```python | |
| from agent_framework import MagenticBuilder, ChatAgent | |
| from agent_framework.openai import OpenAIChatClient | |
| # All imports successful | |
| ``` | |
| ### Agent Creation Test: PASSED | |
| ```python | |
| from src.agents.magentic_agents import create_search_agent | |
| search_agent = create_search_agent() | |
| # SearchAgent created: SearchAgent | |
| # Description: Searches biomedical databases (PubMed, ClinicalTrials.gov, bioRxiv) | |
| ``` | |
| ### Workflow Build Test: PASSED | |
| ```python | |
| from src.orchestrator_magentic import MagenticOrchestrator | |
| orchestrator = MagenticOrchestrator(max_rounds=2) | |
| workflow = orchestrator._build_workflow() | |
| # Workflow built successfully: <class 'agent_framework._workflows._workflow.Workflow'> | |
| ``` | |
| ### Model Configuration: CORRECT | |
| ```python | |
| settings.openai_model = "gpt-5.1" # β Using GPT-5.1, not GPT-4o | |
| settings.openai_api_key = True # β API key is set | |
| ``` | |
| --- | |
| ## What Magentic Provides (Working) | |
| 1. **Multi-Agent Coordination** | |
| - Manager agent orchestrates SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent | |
| - Uses `MagenticBuilder().with_standard_manager()` for coordination | |
| 2. **ChatAgent Pattern** | |
| - Each agent has internal LLM (GPT-5.1) | |
| - Can call tools via `@ai_function` decorator | |
| - Has proper instructions for domain-specific tasks | |
| 3. **Workflow Streaming** | |
| - Events: `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, etc. | |
| - Real-time UI updates via `workflow.run_stream(task)` | |
| 4. **State Management** | |
| - `MagenticState` persists evidence across agents | |
| - `get_bibliography()` tool for ReportAgent | |
| --- | |
| ## What's Actually Broken: The Search Tools | |
| ### File: `src/agents/tools.py` | |
| The Magentic agents call these tools: | |
| - `search_pubmed` β Uses `PubMedTool` | |
| - `search_clinical_trials` β Uses `ClinicalTrialsTool` | |
| - `search_preprints` β Uses `BioRxivTool` | |
| **These tools are the problem, not the framework.** | |
| --- | |
| ## Search Tool Bugs (Detailed) | |
| ### BUG 1: BioRxiv API Does Not Support Search | |
| **File:** `src/tools/biorxiv.py:248-286` | |
| ```python | |
| # This fetches the FIRST 100 papers from the last 90 days | |
| # It does NOT search by keyword - the API doesn't support that | |
| url = f"{self.BASE_URL}/{self.server}/{interval}/0/json" | |
| # Then filters client-side for keywords | |
| matching = self._filter_by_keywords(papers, query_terms, max_results) | |
| ``` | |
| **Problem:** | |
| - Fetches 100 random chronological papers | |
| - Filters for ANY keyword match in title/abstract | |
| - "Long COVID medications" returns papers about "calf muscles" because they mention "COVID" once | |
| **Fix:** Remove BioRxiv or use Europe PMC (which has actual search) | |
| --- | |
| ### BUG 2: PubMed Query Not Optimized | |
| **File:** `src/tools/pubmed.py:54-71` | |
| ```python | |
| search_params = self._build_params( | |
| db="pubmed", | |
| term=query, # RAW USER QUERY - no preprocessing! | |
| retmax=max_results, | |
| sort="relevance", | |
| ) | |
| ``` | |
| **Problem:** | |
| - User enters: "What medications show promise for Long COVID?" | |
| - PubMed receives: `What medications show promise for Long COVID?` | |
| - Should receive: `("long covid"[Title/Abstract] OR "PASC"[Title/Abstract]) AND (treatment[Title/Abstract] OR drug[Title/Abstract])` | |
| **Fix:** Add query preprocessing: | |
| 1. Strip question words (what, which, how, etc.) | |
| 2. Expand medical synonyms (Long COVID β PASC, Post-COVID) | |
| 3. Use MeSH terms for better recall | |
| --- | |
| ### BUG 3: ClinicalTrials.gov No Filtering | |
| **File:** `src/tools/clinicaltrials.py` | |
| Returns ALL trials including: | |
| - Withdrawn trials | |
| - Terminated trials | |
| - Observational studies (not drug interventions) | |
| - Phase 1 (no efficacy data) | |
| **Fix:** Filter by: | |
| - `studyType=INTERVENTIONAL` | |
| - `phase=PHASE2,PHASE3,PHASE4` | |
| - `status=COMPLETED,ACTIVE_NOT_RECRUITING,RECRUITING` | |
| --- | |
| ## Evidence: Garbage In β Garbage Out | |
| When the Magentic SearchAgent calls these tools: | |
| ``` | |
| SearchAgent: "Find evidence for Long COVID medications" | |
| β | |
| βΌ | |
| search_pubmed("Long COVID medications") | |
| β Returns 1 semi-relevant paper (raw query hits) | |
| search_preprints("Long COVID medications") | |
| β Returns garbage (BioRxiv API doesn't search) | |
| β "Calf muscle adaptations" (has "COVID" somewhere) | |
| β "Ophthalmologist work-life balance" (mentions COVID) | |
| search_clinical_trials("Long COVID medications") | |
| β Returns all trials, no filtering | |
| β | |
| βΌ | |
| JudgeAgent receives garbage evidence | |
| β | |
| βΌ | |
| HypothesisAgent can't generate good hypotheses from garbage | |
| β | |
| βΌ | |
| ReportAgent produces garbage report | |
| ``` | |
| **The framework is doing its job. It's orchestrating agents correctly. But the agents are being fed garbage data.** | |
| --- | |
| ## Recommended Fixes | |
| ### Priority 1: Delete or Fix BioRxiv (30 min) | |
| **Option A: Delete it** | |
| ```python | |
| # In src/agents/tools.py, remove: | |
| # from src.tools.biorxiv import BioRxivTool | |
| # _biorxiv = BioRxivTool() | |
| # @ai_function search_preprints(...) | |
| ``` | |
| **Option B: Replace with Europe PMC** | |
| Europe PMC has preprints AND proper search API: | |
| ``` | |
| https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=long+covid+treatment&format=json | |
| ``` | |
| ### Priority 2: Fix PubMed Query (1 hour) | |
| Add query preprocessor: | |
| ```python | |
| def preprocess_query(raw_query: str) -> str: | |
| """Convert natural language to PubMed query syntax.""" | |
| # Strip question words | |
| # Expand medical synonyms | |
| # Add field tags [Title/Abstract] | |
| # Return optimized query | |
| ``` | |
| ### Priority 3: Filter ClinicalTrials (30 min) | |
| Add parameters to API call: | |
| ```python | |
| params = { | |
| "query.term": query, | |
| "filter.overallStatus": "COMPLETED,RECRUITING", | |
| "filter.studyType": "INTERVENTIONAL", | |
| "pageSize": max_results, | |
| } | |
| ``` | |
| --- | |
| ## Conclusion | |
| **Microsoft Agent Framework: NO BUGS FOUND** | |
| - Imports work β | |
| - Agent creation works β | |
| - Workflow building works β | |
| - Model config correct (GPT-5.1) β | |
| - Streaming events work β | |
| **Search Tools: CRITICALLY BROKEN** | |
| - BioRxiv: API doesn't support search (fundamental) | |
| - PubMed: No query optimization (fixable) | |
| - ClinicalTrials: No filtering (fixable) | |
| **Recommendation:** | |
| 1. Delete BioRxiv immediately (unusable) | |
| 2. Add PubMed query preprocessing | |
| 3. Add ClinicalTrials filtering | |
| 4. Then the Magentic multi-agent system will work as designed | |