| # DeepBoner: Medical Drug Repurposing Research Agent |
| ## Project Overview |
|
|
| --- |
|
|
| ## Executive Summary |
|
|
| **DeepBoner** is a deep research agent designed to accelerate medical drug repurposing research by autonomously searching, analyzing, and synthesizing evidence from multiple biomedical databases. |
|
|
| ### The Problem We Solve |
|
|
| Drug repurposing - finding new therapeutic uses for existing FDA-approved drugs - can take years of manual literature review. Researchers must: |
| - Search thousands of papers across multiple databases |
| - Identify molecular mechanisms |
| - Find relevant clinical trials |
| - Assess safety profiles |
| - Synthesize evidence into actionable insights |
|
|
| **DeepBoner automates this process from hours to minutes.** |
|
|
| ### What Is Drug Repurposing? |
|
|
| **Simple Explanation:** |
| Using existing approved drugs to treat NEW diseases they weren't originally designed for. |
|
|
| **Real Examples:** |
| - **Viagra** (sildenafil): Originally for heart disease β Now treats erectile dysfunction |
| - **Thalidomide**: Once banned β Now treats multiple myeloma |
| - **Aspirin**: Pain reliever β Heart attack prevention |
| - **Metformin**: Diabetes drug β Being tested for aging/longevity |
|
|
| **Why It Matters:** |
| - Faster than developing new drugs (years vs decades) |
| - Cheaper (known safety profiles) |
| - Lower risk (already FDA approved) |
| - Immediate patient benefit potential |
|
|
| --- |
|
|
| ## Core Use Case |
|
|
| ### Primary Query Type |
| > "What existing drugs might help treat [disease/condition]?" |
|
|
| ### Example Queries |
|
|
| 1. **Long COVID Fatigue** |
| - Query: "What existing drugs might help treat long COVID fatigue?" |
| - Agent searches: PubMed, clinical trials, drug databases |
| - Output: List of candidate drugs with mechanisms + evidence + citations |
|
|
| 2. **Alzheimer's Disease** |
| - Query: "Find existing drugs that target beta-amyloid pathways" |
| - Agent identifies: Disease mechanisms β Drug candidates β Clinical evidence |
| - Output: Comprehensive research report with drug candidates |
|
|
| 3. **Rare Disease Treatment** |
| - Query: "What drugs might help with fibrodysplasia ossificans progressiva?" |
| - Agent finds: Similar conditions β Shared pathways β Potential treatments |
| - Output: Evidence-based treatment suggestions |
|
|
| --- |
|
|
| ## System Architecture |
|
|
| ### High-Level Design (Phases 1-8) |
|
|
| ```text |
| User Query |
| β |
| Gradio UI (Phase 4) |
| β |
| Magentic Manager (Phase 5) β LLM-powered coordinator |
| βββ SearchAgent (Phase 2+5) ββ PubMed + Web + VectorDB (Phase 6) |
| βββ HypothesisAgent (Phase 7) ββ Mechanistic Reasoning |
| βββ JudgeAgent (Phase 3+5) ββ Evidence Assessment |
| βββ ReportAgent (Phase 8) ββ Final Synthesis |
| β |
| Structured Research Report |
| ``` |
|
|
| ### Key Components |
|
|
| 1. **Magentic Manager (Orchestrator)** |
| - LLM-powered multi-agent coordinator |
| - Dynamic planning and agent selection |
| - Built-in stall detection and replanning |
| - Microsoft Agent Framework integration |
|
|
| 2. **SearchAgent (Phase 2+5+6)** |
| - PubMed E-utilities search |
| - DuckDuckGo web search |
| - Semantic search via ChromaDB (Phase 6) |
| - Evidence deduplication |
|
|
| 3. **HypothesisAgent (Phase 7)** |
| - Generates Drug β Target β Pathway β Effect hypotheses |
| - Guides targeted searches |
| - Scientific reasoning about mechanisms |
|
|
| 4. **JudgeAgent (Phase 3+5)** |
| - LLM-based evidence assessment |
| - Mechanism score + Clinical score |
| - Recommends continue/synthesize |
| - Generates refined search queries |
|
|
| 5. **ReportAgent (Phase 8)** |
| - Structured scientific reports |
| - Executive summary, methodology |
| - Hypotheses tested with evidence counts |
| - Proper citations and limitations |
|
|
| 6. **Gradio UI (Phase 4)** |
| - Chat interface for questions |
| - Real-time progress via events |
| - Mode toggle (Simple/Magentic) |
| - Formatted markdown output |
|
|
| --- |
|
|
| ## Design Patterns |
|
|
| ### 1. Search-and-Judge Loop (Primary Pattern) |
|
|
| ```python |
| def research(question: str) -> Report: |
| context = [] |
| for iteration in range(max_iterations): |
| # SEARCH: Query relevant tools |
| results = search_tools(question, context) |
| context.extend(results) |
| |
| # JUDGE: Evaluate quality |
| if judge.is_sufficient(question, context): |
| break |
| |
| # REFINE: Adjust search strategy |
| query = refine_query(question, context) |
| |
| # SYNTHESIZE: Generate report |
| return synthesize_report(question, context) |
| ``` |
|
|
| **Why This Pattern:** |
| - Simple to implement and debug |
| - Clear loop termination conditions |
| - Iterative improvement of search quality |
| - Balances depth vs speed |
|
|
| ### 2. Multi-Tool Orchestration |
|
|
| ``` |
| Question β Agent decides which tools to use |
| β |
| βββββ΄βββββ¬ββββββββββ¬βββββββββββ |
| β β β β |
| PubMed Web Search Trials DB Drug DB |
| β β β β |
| βββββ¬βββββ΄ββββββββββ΄βββββββββββ |
| β |
| Aggregate Results β Judge |
| ``` |
|
|
| **Why This Pattern:** |
| - Different sources provide different evidence types |
| - Parallel tool execution (when possible) |
| - Comprehensive coverage |
|
|
| ### 3. LLM-as-Judge with Token Budget |
|
|
| **Dual Stopping Conditions:** |
| - **Smart Stop**: LLM judge says "we have sufficient evidence" |
| - **Hard Stop**: Token budget exhausted OR max iterations reached |
|
|
| **Why Both:** |
| - Judge enables early exit when answer is good |
| - Budget prevents runaway costs |
| - Iterations prevent infinite loops |
|
|
| ### 4. Stateful Checkpointing |
|
|
| ``` |
| .deepresearch/ |
| βββ state/ |
| β βββ query_123.json # Current research state |
| βββ checkpoints/ |
| β βββ query_123_iter3/ # Checkpoint at iteration 3 |
| βββ workspace/ |
| βββ query_123/ # Downloaded papers, data |
| ``` |
|
|
| **Why This Pattern:** |
| - Resume interrupted research |
| - Debugging and analysis |
| - Cost savings (don't re-search) |
|
|
| --- |
|
|
| ## Component Breakdown |
|
|
| ### Agent (Orchestrator) |
| - **Responsibility**: Coordinate research process |
| - **Size**: ~100 lines |
| - **Key Methods**: |
| - `research(question)` - Main entry point |
| - `plan_search_strategy()` - Decide what to search |
| - `execute_search()` - Run tool queries |
| - `evaluate_progress()` - Call judge |
| - `synthesize_findings()` - Generate report |
|
|
| ### Tools |
| - **Responsibility**: Interface with external data sources |
| - **Size**: ~50 lines per tool |
| - **Implementations**: |
| - `PubMedTool` - Search biomedical literature |
| - `WebSearchTool` - General medical information |
| - `ClinicalTrialsTool` - Trial data (optional) |
| - `DrugInfoTool` - FDA drug database (optional) |
|
|
| ### Judge |
| - **Responsibility**: Evaluate evidence quality |
| - **Size**: ~50 lines |
| - **Key Methods**: |
| - `is_sufficient(question, evidence)` β bool |
| - `assess_quality(evidence)` β score |
| - `identify_gaps(question, evidence)` β missing_info |
| |
| ### Gradio App |
| - **Responsibility**: User interface |
| - **Size**: ~50 lines |
| - **Features**: |
| - Text input for questions |
| - Progress indicators |
| - Formatted output with citations |
| - Download research report |
| |
| --- |
| |
| ## Technical Stack |
| |
| ### Core Dependencies |
| ```toml |
| [dependencies] |
| python = ">=3.10" |
| pydantic = "^2.7" |
| pydantic-ai = "^0.0.16" |
| fastmcp = "^0.1.0" |
| gradio = "^5.0" |
| beautifulsoup4 = "^4.12" |
| httpx = "^0.27" |
| ``` |
| |
| ### Optional Enhancements |
| - `modal` - For GPU-accelerated local LLM |
| - `fastmcp` - MCP server integration |
| - `sentence-transformers` - Semantic search |
| - `faiss-cpu` - Vector similarity |
| |
| ### Tool APIs & Rate Limits |
| |
| | API | Cost | Rate Limit | API Key? | Notes | |
| |-----|------|------------|----------|-------| |
| | **PubMed E-utilities** | Free | 3/sec (no key), 10/sec (with key) | Optional | Register at NCBI for higher limits | |
| | **Brave Search API** | Free tier | 2000/month free | Required | Primary web search | |
| | **DuckDuckGo** | Free | Unofficial, ~1/sec | No | Fallback web search | |
| | **ClinicalTrials.gov** | Free | 100/min | No | Stretch goal | |
| | **OpenFDA** | Free | 240/min (no key), 120K/day (with key) | Optional | Drug info | |
| |
| **Web Search Strategy (Priority Order):** |
| 1. **Brave Search API** (free tier: 2000 queries/month) - Primary |
| 2. **DuckDuckGo** (unofficial, no API key) - Fallback |
| 3. **SerpAPI** ($50/month) - Only if free options fail |
| |
| **Why NOT SerpAPI first?** |
| - Costs money (hackathon budget = $0) |
| - Free alternatives work fine for demo |
| - Can upgrade later if needed |
| |
| --- |
| |
| ## Success Criteria |
| |
| ### Phase 1-5 (MVP) β
COMPLETE |
| **Completed in ONE DAY:** |
| - [x] User can ask drug repurposing question |
| - [x] Agent searches PubMed (async) |
| - [x] Agent searches web (DuckDuckGo) |
| - [x] LLM judge evaluates evidence quality |
| - [x] System respects token budget and iterations |
| - [x] Output includes drug candidates + citations |
| - [x] Works end-to-end for demo query |
| - [x] Gradio UI with streaming progress |
| - [x] Magentic multi-agent orchestration |
| - [x] 38 unit tests passing |
| - [x] CI/CD pipeline green |
| |
| ### Hackathon Submission β
COMPLETE |
| - [x] Gradio UI deployed on HuggingFace Spaces |
| - [x] Example queries working and tested |
| - [x] Architecture documentation |
| - [x] README with setup instructions |
| |
| ### Phase 6-8 (Enhanced) |
| **Specs ready for implementation:** |
| - [ ] Embeddings & Semantic Search (Phase 6) |
| - [ ] Hypothesis Agent (Phase 7) |
| - [ ] Report Agent (Phase 8) |
| |
| ### What's EXPLICITLY Out of Scope |
| **NOT building (to stay focused):** |
| - β User authentication |
| - β Database storage of queries |
| - β Multi-user support |
| - β Payment/billing |
| - β Production monitoring |
| - β Mobile UI |
| |
| --- |
| |
| ## Implementation Timeline |
| |
| ### Day 1 (Today): Architecture & Setup |
| - [x] Define use case (drug repurposing) β
|
| - [x] Write architecture docs β
|
| - [ ] Create project structure |
| - [ ] First PR: Structure + Docs |
| |
| ### Day 2: Core Agent Loop |
| - [ ] Implement basic orchestrator |
| - [ ] Add PubMed search tool |
| - [ ] Simple judge (keyword-based) |
| - [ ] Test with 1 query |
| |
| ### Day 3: Intelligence Layer |
| - [ ] Upgrade to LLM judge |
| - [ ] Add web search tool |
| - [ ] Token budget tracking |
| - [ ] Test with multiple queries |
| |
| ### Day 4: UI & Integration |
| - [ ] Build Gradio interface |
| - [ ] Wire up agent to UI |
| - [ ] Add progress indicators |
| - [ ] Format output nicely |
| |
| ### Day 5: Polish & Extend |
| - [ ] Add more tools (clinical trials) |
| - [ ] Improve judge prompts |
| - [ ] Checkpoint system |
| - [ ] Error handling |
| |
| ### Day 6: Deploy & Document |
| - [ ] Deploy to HuggingFace Spaces |
| - [ ] Record demo video |
| - [ ] Write submission materials |
| - [ ] Final testing |
| |
| --- |
| |
| ## Questions This Document Answers |
| |
| ### For The Maintainer |
| |
| **Q: "What should our design pattern be?"** |
| A: Search-and-judge loop with multi-tool orchestration (detailed in Design Patterns section) |
| |
| **Q: "Should we use LLM-as-judge or token budget?"** |
| A: Both - judge for smart stopping, budget for cost control |
| |
| **Q: "What's the break pattern?"** |
| A: Three conditions: judge approval, token limit, or max iterations (whichever comes first) |
| |
| **Q: "What components do we need?"** |
| A: Agent orchestrator, tools (PubMed/web), judge, Gradio UI (see Component Breakdown) |
| |
| ### For The Team |
| |
| **Q: "What are we actually building?"** |
| A: Medical drug repurposing research agent (see Core Use Case) |
| |
| **Q: "How complex should it be?"** |
| A: Simple but complete - ~300 lines of core code (see Component sizes) |
| |
| **Q: "What's the timeline?"** |
| A: 6 days, MVP by Day 3, polish Days 4-6 (see Implementation Timeline) |
| |
| **Q: "What datasets/APIs do we use?"** |
| A: PubMed (free), web search, clinical trials.gov (see Tool APIs) |
| |
| --- |
| |
| ## Next Steps |
| |
| 1. **Review this document** - Team feedback on architecture |
| 2. **Finalize design** - Incorporate feedback |
| 3. **Create project structure** - Scaffold repository |
| 4. **Move to proper docs** - `docs/architecture/` folder |
| 5. **Open first PR** - Structure + Documentation |
| 6. **Start implementation** - Day 2 onward |
| |
| --- |
| |
| ## Notes & Decisions |
| |
| ### Why Drug Repurposing? |
| - Clear, impressive use case |
| - Real-world medical impact |
| - Good data availability (PubMed, trials) |
| - Easy to explain (Viagra example!) |
| - Physician on team β
|
| |
| ### Why Simple Architecture? |
| - 6-day timeline |
| - Need working end-to-end system |
| - Hackathon judges value "works" over "complex" |
| - Can extend later if successful |
| |
| ### Why These Tools First? |
| - PubMed: Best biomedical literature source |
| - Web search: General medical knowledge |
| - Clinical trials: Evidence of actual testing |
| - Others: Nice-to-have, not critical for MVP |
| |
| --- |
| |
| --- |
| |
| ## Appendix A: Demo Queries (Pre-tested) |
| |
| These queries will be used for demo and testing. They're chosen because: |
| 1. They have good PubMed coverage |
| 2. They're medically interesting |
| 3. They show the system's capabilities |
| |
| ### Primary Demo Query |
| ``` |
| "What existing drugs might help treat long COVID fatigue?" |
| ``` |
| **Expected candidates**: CoQ10, Low-dose Naltrexone, Modafinil |
| **Expected sources**: 20+ PubMed papers, 2-3 clinical trials |
| |
| ### Secondary Demo Queries |
| ``` |
| "Find existing drugs that might slow Alzheimer's progression" |
| "What approved medications could help with fibromyalgia pain?" |
| "Which diabetes drugs show promise for cancer treatment?" |
| ``` |
| |
| ### Why These Queries? |
| - Represent real clinical needs |
| - Have substantial literature |
| - Show diverse drug classes |
| - Physician on team can validate results |
| |
| --- |
| |
| ## Appendix B: Risk Assessment |
| |
| | Risk | Likelihood | Impact | Mitigation | |
| |------|------------|--------|------------| |
| | PubMed rate limiting | Medium | High | Implement caching, respect 3/sec | |
| | Web search API fails | Low | Medium | DuckDuckGo fallback | |
| | LLM costs exceed budget | Medium | Medium | Hard token cap at 50K | |
| | Judge quality poor | Medium | High | Pre-test prompts, iterate | |
| | HuggingFace deploy issues | Low | High | Test deployment Day 4 | |
| | Demo crashes live | Medium | High | Pre-recorded backup video | |
| |
| --- |
| |
| --- |
| |
| **Document Status**: Official Architecture Spec |
| **Review Score**: 98/100 |
| **Last Updated**: November 2025 |
| |