Spaces:

MCP-1st-Birthday
/

DeepBoner

Running

File size: 13,769 Bytes

# DeepBoner: Medical Drug Repurposing Research Agent
## Project Overview

---

## Executive Summary

**DeepBoner** is a deep research agent designed to accelerate medical drug repurposing research by autonomously searching, analyzing, and synthesizing evidence from multiple biomedical databases.

### The Problem We Solve

Drug repurposing - finding new therapeutic uses for existing FDA-approved drugs - can take years of manual literature review. Researchers must:
- Search thousands of papers across multiple databases
- Identify molecular mechanisms
- Find relevant clinical trials
- Assess safety profiles
- Synthesize evidence into actionable insights

**DeepBoner automates this process from hours to minutes.**

### What Is Drug Repurposing?

**Simple Explanation:**
Using existing approved drugs to treat NEW diseases they weren't originally designed for.

**Real Examples:**
- **Viagra** (sildenafil): Originally for heart disease → Now treats erectile dysfunction
- **Thalidomide**: Once banned → Now treats multiple myeloma
- **Aspirin**: Pain reliever → Heart attack prevention
- **Metformin**: Diabetes drug → Being tested for aging/longevity

**Why It Matters:**
- Faster than developing new drugs (years vs decades)
- Cheaper (known safety profiles)
- Lower risk (already FDA approved)
- Immediate patient benefit potential

---

## Core Use Case

### Primary Query Type
> "What existing drugs might help treat [disease/condition]?"

### Example Queries

1. **Long COVID Fatigue**
   - Query: "What existing drugs might help treat long COVID fatigue?"
   - Agent searches: PubMed, clinical trials, drug databases
   - Output: List of candidate drugs with mechanisms + evidence + citations

2. **Alzheimer's Disease**
   - Query: "Find existing drugs that target beta-amyloid pathways"
   - Agent identifies: Disease mechanisms → Drug candidates → Clinical evidence
   - Output: Comprehensive research report with drug candidates

3. **Rare Disease Treatment**
   - Query: "What drugs might help with fibrodysplasia ossificans progressiva?"
   - Agent finds: Similar conditions → Shared pathways → Potential treatments
   - Output: Evidence-based treatment suggestions

---

## System Architecture

### High-Level Design (Phases 1-8)

```text
User Query
    ↓
Gradio UI (Phase 4)
    ↓
Magentic Manager (Phase 5) ← LLM-powered coordinator
    ├── SearchAgent (Phase 2+5) ←→ PubMed + Web + VectorDB (Phase 6)
    ├── HypothesisAgent (Phase 7) ←→ Mechanistic Reasoning
    ├── JudgeAgent (Phase 3+5) ←→ Evidence Assessment
    └── ReportAgent (Phase 8) ←→ Final Synthesis
    ↓
Structured Research Report
```

### Key Components

1. **Magentic Manager (Orchestrator)**
   - LLM-powered multi-agent coordinator
   - Dynamic planning and agent selection
   - Built-in stall detection and replanning
   - Microsoft Agent Framework integration

2. **SearchAgent (Phase 2+5+6)**
   - PubMed E-utilities search
   - DuckDuckGo web search
   - Semantic search via ChromaDB (Phase 6)
   - Evidence deduplication

3. **HypothesisAgent (Phase 7)**
   - Generates Drug → Target → Pathway → Effect hypotheses
   - Guides targeted searches
   - Scientific reasoning about mechanisms

4. **JudgeAgent (Phase 3+5)**
   - LLM-based evidence assessment
   - Mechanism score + Clinical score
   - Recommends continue/synthesize
   - Generates refined search queries

5. **ReportAgent (Phase 8)**
   - Structured scientific reports
   - Executive summary, methodology
   - Hypotheses tested with evidence counts
   - Proper citations and limitations

6. **Gradio UI (Phase 4)**
   - Chat interface for questions
   - Real-time progress via events
   - Mode toggle (Simple/Magentic)
   - Formatted markdown output

---

## Design Patterns

### 1. Search-and-Judge Loop (Primary Pattern)

```python
def research(question: str) -> Report:
    context = []
    for iteration in range(max_iterations):
        # SEARCH: Query relevant tools
        results = search_tools(question, context)
        context.extend(results)

        # JUDGE: Evaluate quality
        if judge.is_sufficient(question, context):
            break

        # REFINE: Adjust search strategy
        query = refine_query(question, context)

    # SYNTHESIZE: Generate report
    return synthesize_report(question, context)
```

**Why This Pattern:**
- Simple to implement and debug
- Clear loop termination conditions
- Iterative improvement of search quality
- Balances depth vs speed

### 2. Multi-Tool Orchestration

```
Question → Agent decides which tools to use
           ↓
       ┌───┴────┬─────────┬──────────┐
       ↓        ↓         ↓          ↓
   PubMed  Web Search  Trials DB  Drug DB
       ↓        ↓         ↓          ↓
       └───┬────┴─────────┴──────────┘
           ↓
    Aggregate Results → Judge
```

**Why This Pattern:**
- Different sources provide different evidence types
- Parallel tool execution (when possible)
- Comprehensive coverage

### 3. LLM-as-Judge with Token Budget

**Dual Stopping Conditions:**
- **Smart Stop**: LLM judge says "we have sufficient evidence"
- **Hard Stop**: Token budget exhausted OR max iterations reached

**Why Both:**
- Judge enables early exit when answer is good
- Budget prevents runaway costs
- Iterations prevent infinite loops

### 4. Stateful Checkpointing

```
.deepresearch/
├── state/
│   └── query_123.json    # Current research state
├── checkpoints/
│   └── query_123_iter3/  # Checkpoint at iteration 3
└── workspace/
    └── query_123/        # Downloaded papers, data
```

**Why This Pattern:**
- Resume interrupted research
- Debugging and analysis
- Cost savings (don't re-search)

---

## Component Breakdown

### Agent (Orchestrator)
- **Responsibility**: Coordinate research process
- **Size**: ~100 lines
- **Key Methods**:
  - `research(question)` - Main entry point
  - `plan_search_strategy()` - Decide what to search
  - `execute_search()` - Run tool queries
  - `evaluate_progress()` - Call judge
  - `synthesize_findings()` - Generate report

### Tools
- **Responsibility**: Interface with external data sources
- **Size**: ~50 lines per tool
- **Implementations**:
  - `PubMedTool` - Search biomedical literature
  - `WebSearchTool` - General medical information
  - `ClinicalTrialsTool` - Trial data (optional)
  - `DrugInfoTool` - FDA drug database (optional)

### Judge
- **Responsibility**: Evaluate evidence quality
- **Size**: ~50 lines
- **Key Methods**:
  - `is_sufficient(question, evidence)` → bool
  - `assess_quality(evidence)` → score
  - `identify_gaps(question, evidence)` → missing_info

### Gradio App
- **Responsibility**: User interface
- **Size**: ~50 lines
- **Features**:
  - Text input for questions
  - Progress indicators
  - Formatted output with citations
  - Download research report

---

## Technical Stack

### Core Dependencies
```toml
[dependencies]
python = ">=3.10"
pydantic = "^2.7"
pydantic-ai = "^0.0.16"
fastmcp = "^0.1.0"
gradio = "^5.0"
beautifulsoup4 = "^4.12"
httpx = "^0.27"
```

### Optional Enhancements
- `modal` - For GPU-accelerated local LLM
- `fastmcp` - MCP server integration
- `sentence-transformers` - Semantic search
- `faiss-cpu` - Vector similarity

### Tool APIs & Rate Limits

| API | Cost | Rate Limit | API Key? | Notes |
|-----|------|------------|----------|-------|
| **PubMed E-utilities** | Free | 3/sec (no key), 10/sec (with key) | Optional | Register at NCBI for higher limits |
| **Brave Search API** | Free tier | 2000/month free | Required | Primary web search |
| **DuckDuckGo** | Free | Unofficial, ~1/sec | No | Fallback web search |
| **ClinicalTrials.gov** | Free | 100/min | No | Stretch goal |
| **OpenFDA** | Free | 240/min (no key), 120K/day (with key) | Optional | Drug info |

**Web Search Strategy (Priority Order):**
1. **Brave Search API** (free tier: 2000 queries/month) - Primary
2. **DuckDuckGo** (unofficial, no API key) - Fallback
3. **SerpAPI** ($50/month) - Only if free options fail

**Why NOT SerpAPI first?**
- Costs money (hackathon budget = $0)
- Free alternatives work fine for demo
- Can upgrade later if needed

---

## Success Criteria

### Phase 1-5 (MVP) ✅ COMPLETE
**Completed in ONE DAY:**
- [x] User can ask drug repurposing question
- [x] Agent searches PubMed (async)
- [x] Agent searches web (DuckDuckGo)
- [x] LLM judge evaluates evidence quality
- [x] System respects token budget and iterations
- [x] Output includes drug candidates + citations
- [x] Works end-to-end for demo query
- [x] Gradio UI with streaming progress
- [x] Magentic multi-agent orchestration
- [x] 38 unit tests passing
- [x] CI/CD pipeline green

### Hackathon Submission ✅ COMPLETE
- [x] Gradio UI deployed on HuggingFace Spaces
- [x] Example queries working and tested
- [x] Architecture documentation
- [x] README with setup instructions

### Phase 6-8 (Enhanced)
**Specs ready for implementation:**
- [ ] Embeddings & Semantic Search (Phase 6)
- [ ] Hypothesis Agent (Phase 7)
- [ ] Report Agent (Phase 8)

### What's EXPLICITLY Out of Scope
**NOT building (to stay focused):**
- ❌ User authentication
- ❌ Database storage of queries
- ❌ Multi-user support
- ❌ Payment/billing
- ❌ Production monitoring
- ❌ Mobile UI

---

## Implementation Timeline

### Day 1 (Today): Architecture & Setup
- [x] Define use case (drug repurposing) ✅
- [x] Write architecture docs ✅
- [ ] Create project structure
- [ ] First PR: Structure + Docs

### Day 2: Core Agent Loop
- [ ] Implement basic orchestrator
- [ ] Add PubMed search tool
- [ ] Simple judge (keyword-based)
- [ ] Test with 1 query

### Day 3: Intelligence Layer
- [ ] Upgrade to LLM judge
- [ ] Add web search tool
- [ ] Token budget tracking
- [ ] Test with multiple queries

### Day 4: UI & Integration
- [ ] Build Gradio interface
- [ ] Wire up agent to UI
- [ ] Add progress indicators
- [ ] Format output nicely

### Day 5: Polish & Extend
- [ ] Add more tools (clinical trials)
- [ ] Improve judge prompts
- [ ] Checkpoint system
- [ ] Error handling

### Day 6: Deploy & Document
- [ ] Deploy to HuggingFace Spaces
- [ ] Record demo video
- [ ] Write submission materials
- [ ] Final testing

---

## Questions This Document Answers

### For The Maintainer

**Q: "What should our design pattern be?"**
A: Search-and-judge loop with multi-tool orchestration (detailed in Design Patterns section)

**Q: "Should we use LLM-as-judge or token budget?"**
A: Both - judge for smart stopping, budget for cost control

**Q: "What's the break pattern?"**
A: Three conditions: judge approval, token limit, or max iterations (whichever comes first)

**Q: "What components do we need?"**
A: Agent orchestrator, tools (PubMed/web), judge, Gradio UI (see Component Breakdown)

### For The Team

**Q: "What are we actually building?"**
A: Medical drug repurposing research agent (see Core Use Case)

**Q: "How complex should it be?"**
A: Simple but complete - ~300 lines of core code (see Component sizes)

**Q: "What's the timeline?"**
A: 6 days, MVP by Day 3, polish Days 4-6 (see Implementation Timeline)

**Q: "What datasets/APIs do we use?"**
A: PubMed (free), web search, clinical trials.gov (see Tool APIs)

---

## Next Steps

1. **Review this document** - Team feedback on architecture
2. **Finalize design** - Incorporate feedback
3. **Create project structure** - Scaffold repository
4. **Move to proper docs** - `docs/architecture/` folder
5. **Open first PR** - Structure + Documentation
6. **Start implementation** - Day 2 onward

---

## Notes & Decisions

### Why Drug Repurposing?
- Clear, impressive use case
- Real-world medical impact
- Good data availability (PubMed, trials)
- Easy to explain (Viagra example!)
- Physician on team ✅

### Why Simple Architecture?
- 6-day timeline
- Need working end-to-end system
- Hackathon judges value "works" over "complex"
- Can extend later if successful

### Why These Tools First?
- PubMed: Best biomedical literature source
- Web search: General medical knowledge
- Clinical trials: Evidence of actual testing
- Others: Nice-to-have, not critical for MVP

---

---

## Appendix A: Demo Queries (Pre-tested)

These queries will be used for demo and testing. They're chosen because:
1. They have good PubMed coverage
2. They're medically interesting
3. They show the system's capabilities

### Primary Demo Query
```
"What existing drugs might help treat long COVID fatigue?"
```
**Expected candidates**: CoQ10, Low-dose Naltrexone, Modafinil
**Expected sources**: 20+ PubMed papers, 2-3 clinical trials

### Secondary Demo Queries
```
"Find existing drugs that might slow Alzheimer's progression"
"What approved medications could help with fibromyalgia pain?"
"Which diabetes drugs show promise for cancer treatment?"
```

### Why These Queries?
- Represent real clinical needs
- Have substantial literature
- Show diverse drug classes
- Physician on team can validate results

---

## Appendix B: Risk Assessment

| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| PubMed rate limiting | Medium | High | Implement caching, respect 3/sec |
| Web search API fails | Low | Medium | DuckDuckGo fallback |
| LLM costs exceed budget | Medium | Medium | Hard token cap at 50K |
| Judge quality poor | Medium | High | Pre-test prompts, iterate |
| HuggingFace deploy issues | Low | High | Test deployment Day 4 |
| Demo crashes live | Medium | High | Pre-recorded backup video |

---

---

**Document Status**: Official Architecture Spec
**Review Score**: 98/100
**Last Updated**: November 2025