Spaces:
Running
Running
Commit
Β·
631e5fc
1
Parent(s):
43cfea2
docs: reorganize documentation structure for clarity
Browse filesDELETE (duplicates/obsolete):
- to_do/REFERENCE_GRADDIO_DEMO_ANALYSIS.md
- bugs/P0_MAGENTIC_MODE_BROKEN.md (superseded by FIX_PLAN)
CREATE:
- future-roadmap/ for planned phases 15-17
- decisions/architecture-2025-11/ for magentic-pydantic docs
- bugs/ACTIVE_BUGS.md index
MOVE:
- DEEP_RESEARCH_ROADMAP.md β future-roadmap/
- 04_OPENALEX_INTEGRATION.md β future-roadmap/
- brainstorming/implementation/*.md β future-roadmap/phases/
- brainstorming/magentic-pydantic/*.md β decisions/architecture-2025-11/
UPDATE:
- docs/index.md: Updated links, Europe PMC references, test count
- docs/bugs/ACTIVE_BUGS.md +39 -0
- docs/bugs/P0_MAGENTIC_MODE_BROKEN.md +0 -116
- docs/{brainstorming/magentic-pydantic β decisions/architecture-2025-11}/00_SITUATION_AND_PLAN.md +0 -0
- docs/{brainstorming/magentic-pydantic β decisions/architecture-2025-11}/01_ARCHITECTURE_SPEC.md +0 -0
- docs/{brainstorming/magentic-pydantic β decisions/architecture-2025-11}/02_IMPLEMENTATION_PHASES.md +0 -0
- docs/{brainstorming/magentic-pydantic β decisions/architecture-2025-11}/03_IMMEDIATE_ACTIONS.md +0 -0
- docs/{brainstorming/magentic-pydantic β decisions/architecture-2025-11}/04_FOLLOWUP_REVIEW_REQUEST.md +0 -0
- docs/{brainstorming/magentic-pydantic β decisions/architecture-2025-11}/REVIEW_PROMPT_FOR_SENIOR_AGENT.md +0 -0
- docs/{to_do β future-roadmap}/DEEP_RESEARCH_ROADMAP.md +0 -0
- docs/{brainstorming/04_OPENALEX_INTEGRATION.md β future-roadmap/OPENALEX_INTEGRATION.md} +0 -0
- docs/{brainstorming/implementation β future-roadmap/phases}/15_PHASE_OPENALEX.md +0 -0
- docs/{brainstorming/implementation β future-roadmap/phases}/16_PHASE_PUBMED_FULLTEXT.md +0 -0
- docs/{brainstorming/implementation β future-roadmap/phases}/17_PHASE_RATE_LIMITING.md +0 -0
- docs/{brainstorming/implementation β future-roadmap/phases}/README.md +0 -0
- docs/index.md +34 -19
- docs/to_do/REFERENCE_GRADDIO_DEMO_ANALYSIS.md +0 -229
docs/bugs/ACTIVE_BUGS.md
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Active Bugs
|
| 2 |
+
|
| 3 |
+
> Last updated: 2025-11-28
|
| 4 |
+
|
| 5 |
+
## P0 - Critical
|
| 6 |
+
|
| 7 |
+
### Magentic Mode Report Generation
|
| 8 |
+
**File**: [FIX_PLAN_MAGENTIC_MODE.md](./FIX_PLAN_MAGENTIC_MODE.md)
|
| 9 |
+
|
| 10 |
+
**Symptom**: Magentic mode returns `ChatMessage` object instead of synthesized report text.
|
| 11 |
+
|
| 12 |
+
**Root Cause**:
|
| 13 |
+
- `event.message.text` extraction fails in orchestrator
|
| 14 |
+
- `max_rounds=3` too low for SearchAgent + JudgeAgent + ReportAgent sequence
|
| 15 |
+
|
| 16 |
+
**Workaround**: Use Simple mode (default) - works correctly with all LLM providers.
|
| 17 |
+
|
| 18 |
+
**Status**: Fix plan documented, not yet implemented.
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## P1 - Minor UX
|
| 23 |
+
|
| 24 |
+
### Gradio Settings Accordion Won't Collapse
|
| 25 |
+
**File**: [P1_GRADIO_SETTINGS_CLEANUP.md](./P1_GRADIO_SETTINGS_CLEANUP.md)
|
| 26 |
+
|
| 27 |
+
**Symptom**: Settings accordion stays open after user interaction.
|
| 28 |
+
|
| 29 |
+
**Root Cause**: Nested `gr.Blocks` context prevents accordion state management.
|
| 30 |
+
|
| 31 |
+
**Impact**: UX only - all functionality works correctly.
|
| 32 |
+
|
| 33 |
+
**Status**: Solution documented, not yet implemented.
|
| 34 |
+
|
| 35 |
+
---
|
| 36 |
+
|
| 37 |
+
## Resolved Bugs
|
| 38 |
+
|
| 39 |
+
*None currently - bugs above are still open.*
|
docs/bugs/P0_MAGENTIC_MODE_BROKEN.md
DELETED
|
@@ -1,116 +0,0 @@
|
|
| 1 |
-
# P0 Bug: Magentic Mode Returns ChatMessage Object Instead of Report Text
|
| 2 |
-
|
| 3 |
-
**Status**: OPEN
|
| 4 |
-
**Priority**: P0 (Critical)
|
| 5 |
-
**Date**: 2025-11-27
|
| 6 |
-
|
| 7 |
-
---
|
| 8 |
-
|
| 9 |
-
## Actual Bug Found (Not What We Thought)
|
| 10 |
-
|
| 11 |
-
**The OpenAI key works fine.** The real bug is different:
|
| 12 |
-
|
| 13 |
-
### The Problem
|
| 14 |
-
|
| 15 |
-
When Magentic mode completes, the final report returns a `ChatMessage` object instead of the actual text:
|
| 16 |
-
|
| 17 |
-
```
|
| 18 |
-
FINAL REPORT:
|
| 19 |
-
<agent_framework._types.ChatMessage object at 0x11db70310>
|
| 20 |
-
```
|
| 21 |
-
|
| 22 |
-
### Evidence
|
| 23 |
-
|
| 24 |
-
Full test output shows:
|
| 25 |
-
1. Magentic orchestrator starts correctly
|
| 26 |
-
2. SearchAgent finds evidence
|
| 27 |
-
3. HypothesisAgent generates hypotheses
|
| 28 |
-
4. JudgeAgent evaluates
|
| 29 |
-
5. **BUT**: Final output is `ChatMessage` object, not text
|
| 30 |
-
|
| 31 |
-
### Root Cause
|
| 32 |
-
|
| 33 |
-
In `src/orchestrator_magentic.py` line 193:
|
| 34 |
-
|
| 35 |
-
```python
|
| 36 |
-
elif isinstance(event, MagenticFinalResultEvent):
|
| 37 |
-
text = event.message.text if event.message else "No result"
|
| 38 |
-
```
|
| 39 |
-
|
| 40 |
-
The `event.message` is a `ChatMessage` object, and `.text` may not extract the content correctly, or the message structure changed in the agent-framework library.
|
| 41 |
-
|
| 42 |
-
---
|
| 43 |
-
|
| 44 |
-
## Secondary Issue: Max Rounds Reached
|
| 45 |
-
|
| 46 |
-
The orchestrator hits max rounds before producing a report:
|
| 47 |
-
|
| 48 |
-
```
|
| 49 |
-
[ERROR] Magentic Orchestrator: Max round count reached
|
| 50 |
-
```
|
| 51 |
-
|
| 52 |
-
This means the workflow times out before the ReportAgent synthesizes the final output.
|
| 53 |
-
|
| 54 |
-
---
|
| 55 |
-
|
| 56 |
-
## What Works
|
| 57 |
-
|
| 58 |
-
- OpenAI API key: **Works** (loaded from .env)
|
| 59 |
-
- SearchAgent: **Works** (finds evidence from PubMed, ClinicalTrials, Europe PMC)
|
| 60 |
-
- HypothesisAgent: **Works** (generates Drug -> Target -> Pathway chains)
|
| 61 |
-
- JudgeAgent: **Partial** (evaluates but sometimes loses context)
|
| 62 |
-
|
| 63 |
-
---
|
| 64 |
-
|
| 65 |
-
## Files to Fix
|
| 66 |
-
|
| 67 |
-
| File | Line | Issue |
|
| 68 |
-
|------|------|-------|
|
| 69 |
-
| `src/orchestrator_magentic.py` | 193 | `event.message.text` returns object, not string |
|
| 70 |
-
| `src/orchestrator_magentic.py` | 97-99 | `max_round_count=3` too low for full pipeline |
|
| 71 |
-
|
| 72 |
-
---
|
| 73 |
-
|
| 74 |
-
## Suggested Fix
|
| 75 |
-
|
| 76 |
-
```python
|
| 77 |
-
# In _process_event, line 192-199
|
| 78 |
-
elif isinstance(event, MagenticFinalResultEvent):
|
| 79 |
-
# Handle ChatMessage object properly
|
| 80 |
-
if event.message:
|
| 81 |
-
if hasattr(event.message, 'content'):
|
| 82 |
-
text = event.message.content
|
| 83 |
-
elif hasattr(event.message, 'text'):
|
| 84 |
-
text = event.message.text
|
| 85 |
-
else:
|
| 86 |
-
text = str(event.message)
|
| 87 |
-
else:
|
| 88 |
-
text = "No result"
|
| 89 |
-
```
|
| 90 |
-
|
| 91 |
-
And increase rounds:
|
| 92 |
-
|
| 93 |
-
```python
|
| 94 |
-
# In _build_workflow, line 97
|
| 95 |
-
max_round_count=self._max_rounds, # Use configured value, default 10
|
| 96 |
-
```
|
| 97 |
-
|
| 98 |
-
---
|
| 99 |
-
|
| 100 |
-
## Test Command
|
| 101 |
-
|
| 102 |
-
```bash
|
| 103 |
-
set -a && source .env && set +a && uv run python examples/orchestrator_demo/run_magentic.py "metformin alzheimer"
|
| 104 |
-
```
|
| 105 |
-
|
| 106 |
-
---
|
| 107 |
-
|
| 108 |
-
## Simple Mode Works
|
| 109 |
-
|
| 110 |
-
For reference, simple mode produces full reports:
|
| 111 |
-
|
| 112 |
-
```bash
|
| 113 |
-
uv run python examples/orchestrator_demo/run_agent.py "metformin alzheimer"
|
| 114 |
-
```
|
| 115 |
-
|
| 116 |
-
Output includes structured report with Drug Candidates, Key Findings, etc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/{brainstorming/magentic-pydantic β decisions/architecture-2025-11}/00_SITUATION_AND_PLAN.md
RENAMED
|
File without changes
|
docs/{brainstorming/magentic-pydantic β decisions/architecture-2025-11}/01_ARCHITECTURE_SPEC.md
RENAMED
|
File without changes
|
docs/{brainstorming/magentic-pydantic β decisions/architecture-2025-11}/02_IMPLEMENTATION_PHASES.md
RENAMED
|
File without changes
|
docs/{brainstorming/magentic-pydantic β decisions/architecture-2025-11}/03_IMMEDIATE_ACTIONS.md
RENAMED
|
File without changes
|
docs/{brainstorming/magentic-pydantic β decisions/architecture-2025-11}/04_FOLLOWUP_REVIEW_REQUEST.md
RENAMED
|
File without changes
|
docs/{brainstorming/magentic-pydantic β decisions/architecture-2025-11}/REVIEW_PROMPT_FOR_SENIOR_AGENT.md
RENAMED
|
File without changes
|
docs/{to_do β future-roadmap}/DEEP_RESEARCH_ROADMAP.md
RENAMED
|
File without changes
|
docs/{brainstorming/04_OPENALEX_INTEGRATION.md β future-roadmap/OPENALEX_INTEGRATION.md}
RENAMED
|
File without changes
|
docs/{brainstorming/implementation β future-roadmap/phases}/15_PHASE_OPENALEX.md
RENAMED
|
File without changes
|
docs/{brainstorming/implementation β future-roadmap/phases}/16_PHASE_PUBMED_FULLTEXT.md
RENAMED
|
File without changes
|
docs/{brainstorming/implementation β future-roadmap/phases}/17_PHASE_RATE_LIMITING.md
RENAMED
|
File without changes
|
docs/{brainstorming/implementation β future-roadmap/phases}/README.md
RENAMED
|
File without changes
|
docs/index.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
| 1 |
# DeepBoner Documentation
|
| 2 |
|
| 3 |
-
##
|
| 4 |
|
| 5 |
-
AI-powered deep research system for
|
| 6 |
|
| 7 |
---
|
| 8 |
|
|
@@ -11,8 +11,9 @@ AI-powered deep research system for accelerating drug repurposing discovery.
|
|
| 11 |
### Architecture
|
| 12 |
- **[Overview](architecture/overview.md)** - Project overview, use case, architecture
|
| 13 |
- **[Design Patterns](architecture/design-patterns.md)** - Technical patterns, data models
|
|
|
|
| 14 |
|
| 15 |
-
### Implementation
|
| 16 |
- **[Roadmap](implementation/roadmap.md)** - Phased execution plan with TDD
|
| 17 |
- **[Phase 1: Foundation](implementation/01_phase_foundation.md)** β
- Tooling, config, first tests
|
| 18 |
- **[Phase 2: Search](implementation/02_phase_search.md)** β
- PubMed search
|
|
@@ -24,25 +25,47 @@ AI-powered deep research system for accelerating drug repurposing discovery.
|
|
| 24 |
- **[Phase 8: Report](implementation/08_phase_report.md)** β
- Structured scientific reports
|
| 25 |
- **[Phase 9: Source Cleanup](implementation/09_phase_source_cleanup.md)** β
- Remove DuckDuckGo
|
| 26 |
- **[Phase 10: ClinicalTrials](implementation/10_phase_clinicaltrials.md)** β
- Clinical trials API
|
| 27 |
-
- **[Phase 11:
|
| 28 |
- **[Phase 12: MCP Server](implementation/12_phase_mcp_server.md)** β
- Claude Desktop integration
|
| 29 |
- **[Phase 13: Modal Integration](implementation/13_phase_modal_integration.md)** β
- Secure code execution
|
| 30 |
- **[Phase 14: Demo Submission](implementation/14_phase_demo_submission.md)** β
- Hackathon submission
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
### Guides
|
| 33 |
- **[Deployment Guide](guides/deployment.md)** - Gradio, MCP, and Modal launch steps
|
| 34 |
|
| 35 |
### Development
|
| 36 |
- **[Testing Strategy](development/testing.md)** - Unit, Integration, and E2E testing patterns
|
| 37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
---
|
| 39 |
|
| 40 |
## What We're Building
|
| 41 |
|
| 42 |
-
**One-liner**: AI agent that searches medical literature to find
|
| 43 |
|
| 44 |
-
**Example
|
| 45 |
-
> "What
|
|
|
|
|
|
|
| 46 |
|
| 47 |
**Output**: Research report with drug candidates, mechanisms, evidence quality, and citations.
|
| 48 |
|
|
@@ -54,7 +77,7 @@ AI-powered deep research system for accelerating drug repurposing discovery.
|
|
| 54 |
User Question β Research Agent (Orchestrator)
|
| 55 |
β
|
| 56 |
Search Loop:
|
| 57 |
-
β Tools (PubMed, ClinicalTrials,
|
| 58 |
β Judge (Quality + Budget)
|
| 59 |
β Repeat or Synthesize
|
| 60 |
β
|
|
@@ -70,15 +93,7 @@ User Question β Research Agent (Orchestrator)
|
|
| 70 |
| **Gradio UI** | β
Complete | Streaming chat interface |
|
| 71 |
| **MCP Server** | β
Complete | Tools accessible from Claude Desktop |
|
| 72 |
| **Modal Sandbox** | β
Complete | Secure statistical analysis |
|
| 73 |
-
| **Multi-Source Search** | β
Complete | PubMed, ClinicalTrials,
|
| 74 |
-
|
| 75 |
-
---
|
| 76 |
-
|
| 77 |
-
## Team
|
| 78 |
-
|
| 79 |
-
- The-Obstacle-Is-The-Way
|
| 80 |
-
- MarioAderman
|
| 81 |
-
- Josephrp
|
| 82 |
|
| 83 |
---
|
| 84 |
|
|
@@ -88,5 +103,5 @@ User Question β Research Agent (Orchestrator)
|
|
| 88 |
|-------|--------|
|
| 89 |
| Phases 1-14 | β
COMPLETE |
|
| 90 |
|
| 91 |
-
**
|
| 92 |
-
**
|
|
|
|
| 1 |
# DeepBoner Documentation
|
| 2 |
|
| 3 |
+
## Sexual Health Research Agent
|
| 4 |
|
| 5 |
+
AI-powered deep research system for sexual wellness, reproductive health, and hormone therapy research.
|
| 6 |
|
| 7 |
---
|
| 8 |
|
|
|
|
| 11 |
### Architecture
|
| 12 |
- **[Overview](architecture/overview.md)** - Project overview, use case, architecture
|
| 13 |
- **[Design Patterns](architecture/design-patterns.md)** - Technical patterns, data models
|
| 14 |
+
- **[Workflow Diagrams](workflow-diagrams.md)** - Visual architecture (Magentic v2.0)
|
| 15 |
|
| 16 |
+
### Implementation (Phases 1-14 β
COMPLETE)
|
| 17 |
- **[Roadmap](implementation/roadmap.md)** - Phased execution plan with TDD
|
| 18 |
- **[Phase 1: Foundation](implementation/01_phase_foundation.md)** β
- Tooling, config, first tests
|
| 19 |
- **[Phase 2: Search](implementation/02_phase_search.md)** β
- PubMed search
|
|
|
|
| 25 |
- **[Phase 8: Report](implementation/08_phase_report.md)** β
- Structured scientific reports
|
| 26 |
- **[Phase 9: Source Cleanup](implementation/09_phase_source_cleanup.md)** β
- Remove DuckDuckGo
|
| 27 |
- **[Phase 10: ClinicalTrials](implementation/10_phase_clinicaltrials.md)** β
- Clinical trials API
|
| 28 |
+
- **[Phase 11: Europe PMC](implementation/11_phase_biorxiv.md)** β
- Preprint search
|
| 29 |
- **[Phase 12: MCP Server](implementation/12_phase_mcp_server.md)** β
- Claude Desktop integration
|
| 30 |
- **[Phase 13: Modal Integration](implementation/13_phase_modal_integration.md)** β
- Secure code execution
|
| 31 |
- **[Phase 14: Demo Submission](implementation/14_phase_demo_submission.md)** β
- Hackathon submission
|
| 32 |
|
| 33 |
+
### Future Roadmap
|
| 34 |
+
- **[Overview](future-roadmap/phases/README.md)** - Planned phases 15-17
|
| 35 |
+
- **[Phase 15: OpenAlex](future-roadmap/phases/15_PHASE_OPENALEX.md)** - Citation network integration
|
| 36 |
+
- **[Phase 16: PubMed Full-text](future-roadmap/phases/16_PHASE_PUBMED_FULLTEXT.md)** - BioC API
|
| 37 |
+
- **[Phase 17: Rate Limiting](future-roadmap/phases/17_PHASE_RATE_LIMITING.md)** - Production hardening
|
| 38 |
+
- **[Deep Research Mode](future-roadmap/DEEP_RESEARCH_ROADMAP.md)** - GPT-Researcher style enhancements
|
| 39 |
+
|
| 40 |
+
### Bugs & Issues
|
| 41 |
+
- **[Active Bugs](bugs/ACTIVE_BUGS.md)** - Current issues and workarounds
|
| 42 |
+
|
| 43 |
+
### Decisions
|
| 44 |
+
- **[PR #55 Evaluation](decisions/2025-11-27-pr55-evaluation.md)** - Architecture decision record
|
| 45 |
+
- **[Magentic + PydanticAI](decisions/architecture-2025-11/)** - Framework architecture decisions
|
| 46 |
+
|
| 47 |
### Guides
|
| 48 |
- **[Deployment Guide](guides/deployment.md)** - Gradio, MCP, and Modal launch steps
|
| 49 |
|
| 50 |
### Development
|
| 51 |
- **[Testing Strategy](development/testing.md)** - Unit, Integration, and E2E testing patterns
|
| 52 |
|
| 53 |
+
### Brainstorming (Source Improvements)
|
| 54 |
+
- **[Roadmap Summary](brainstorming/00_ROADMAP_SUMMARY.md)** - Data source enhancement ideas
|
| 55 |
+
- **[PubMed Improvements](brainstorming/01_PUBMED_IMPROVEMENTS.md)**
|
| 56 |
+
- **[ClinicalTrials Improvements](brainstorming/02_CLINICALTRIALS_IMPROVEMENTS.md)**
|
| 57 |
+
- **[Europe PMC Improvements](brainstorming/03_EUROPEPMC_IMPROVEMENTS.md)**
|
| 58 |
+
|
| 59 |
---
|
| 60 |
|
| 61 |
## What We're Building
|
| 62 |
|
| 63 |
+
**One-liner**: AI agent that searches medical literature to find evidence for sexual health research questions.
|
| 64 |
|
| 65 |
+
**Example Queries**:
|
| 66 |
+
> "What drugs improve female libido post-menopause?"
|
| 67 |
+
> "Evidence for testosterone therapy in women with HSDD?"
|
| 68 |
+
> "Clinical trials for erectile dysfunction alternatives to PDE5 inhibitors?"
|
| 69 |
|
| 70 |
**Output**: Research report with drug candidates, mechanisms, evidence quality, and citations.
|
| 71 |
|
|
|
|
| 77 |
User Question β Research Agent (Orchestrator)
|
| 78 |
β
|
| 79 |
Search Loop:
|
| 80 |
+
β Tools (PubMed, ClinicalTrials, Europe PMC)
|
| 81 |
β Judge (Quality + Budget)
|
| 82 |
β Repeat or Synthesize
|
| 83 |
β
|
|
|
|
| 93 |
| **Gradio UI** | β
Complete | Streaming chat interface |
|
| 94 |
| **MCP Server** | β
Complete | Tools accessible from Claude Desktop |
|
| 95 |
| **Modal Sandbox** | β
Complete | Secure statistical analysis |
|
| 96 |
+
| **Multi-Source Search** | β
Complete | PubMed, ClinicalTrials, Europe PMC |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
|
| 98 |
---
|
| 99 |
|
|
|
|
| 103 |
|-------|--------|
|
| 104 |
| Phases 1-14 | β
COMPLETE |
|
| 105 |
|
| 106 |
+
**Tests**: 127 passing, 0 warnings
|
| 107 |
+
**Known Issues**: See [Active Bugs](bugs/ACTIVE_BUGS.md)
|
docs/to_do/REFERENCE_GRADDIO_DEMO_ANALYSIS.md
DELETED
|
@@ -1,229 +0,0 @@
|
|
| 1 |
-
# Reference: GradioDemo Analysis
|
| 2 |
-
|
| 3 |
-
> Analysis of code from https://github.com/DeepBoner/GradioDemo
|
| 4 |
-
> Purpose: Extract good ideas, understand patterns, avoid mistakes
|
| 5 |
-
|
| 6 |
-
## Overview
|
| 7 |
-
|
| 8 |
-
| Metric | Value |
|
| 9 |
-
|--------|-------|
|
| 10 |
-
| Total lines added | ~7,000 |
|
| 11 |
-
| New Python files | +20 |
|
| 12 |
-
| Test pass rate | 80% (62 errors due to missing mocks) |
|
| 13 |
-
| Integration status | **NOT WIRED IN** |
|
| 14 |
-
|
| 15 |
-
## Component Catalog
|
| 16 |
-
|
| 17 |
-
### REDUNDANT (Already have equivalent)
|
| 18 |
-
|
| 19 |
-
| Component | Lines | What We Have Instead |
|
| 20 |
-
|-----------|-------|---------------------|
|
| 21 |
-
| `orchestrator/graph_orchestrator.py` | 974 | MagenticBuilder |
|
| 22 |
-
| `middleware/budget_tracker.py` | 391 | MagenticBuilder max_round_count |
|
| 23 |
-
| `middleware/state_machine.py` | 130 | agents/state.py with contextvars |
|
| 24 |
-
| `middleware/workflow_manager.py` | 300 | asyncio.gather() |
|
| 25 |
-
| `orchestrator/research_flow.py` (IterativeResearchFlow) | 500 | MagenticOrchestrator |
|
| 26 |
-
| HuggingFace integration | various | HFInferenceJudgeHandler |
|
| 27 |
-
|
| 28 |
-
### POTENTIALLY USEFUL (Ideas to cherry-pick)
|
| 29 |
-
|
| 30 |
-
#### 1. InputParser (`agents/input_parser.py` - 179 lines)
|
| 31 |
-
|
| 32 |
-
**Idea**: Detect research mode from query text.
|
| 33 |
-
|
| 34 |
-
```python
|
| 35 |
-
# Key logic (simplified)
|
| 36 |
-
research_mode: Literal["iterative", "deep"] = "iterative"
|
| 37 |
-
if any(keyword in query.lower() for keyword in [
|
| 38 |
-
"comprehensive", "report", "sections", "analyze", "analysis", "overview", "market"
|
| 39 |
-
]):
|
| 40 |
-
research_mode = "deep"
|
| 41 |
-
```
|
| 42 |
-
|
| 43 |
-
**Good pattern**: Heuristic fallback when LLM fails.
|
| 44 |
-
**Our implementation**: See Phase 1 in DEEP_RESEARCH_ROADMAP.md
|
| 45 |
-
|
| 46 |
-
#### 2. PlannerAgent (`orchestrator/planner_agent.py` - 184 lines)
|
| 47 |
-
|
| 48 |
-
**Idea**: LLM creates section outline for report.
|
| 49 |
-
|
| 50 |
-
```python
|
| 51 |
-
class ReportPlan(BaseModel):
|
| 52 |
-
title: str
|
| 53 |
-
sections: list[ReportSection]
|
| 54 |
-
estimated_time_minutes: int
|
| 55 |
-
|
| 56 |
-
class ReportSection(BaseModel):
|
| 57 |
-
title: str
|
| 58 |
-
query: str
|
| 59 |
-
description: str
|
| 60 |
-
priority: int
|
| 61 |
-
```
|
| 62 |
-
|
| 63 |
-
**Good pattern**: Structured output with Pydantic models.
|
| 64 |
-
**Our implementation**: See Phase 2 in DEEP_RESEARCH_ROADMAP.md
|
| 65 |
-
|
| 66 |
-
#### 3. DeepResearchFlow (`orchestrator/research_flow.py` - 500 lines)
|
| 67 |
-
|
| 68 |
-
**Idea**: Run parallel research loops per section.
|
| 69 |
-
|
| 70 |
-
```python
|
| 71 |
-
# Their pattern (simplified)
|
| 72 |
-
async def run_parallel_loops(sections: list[ReportSection]):
|
| 73 |
-
tasks = [run_single_loop(s) for s in sections]
|
| 74 |
-
results = await asyncio.gather(*tasks, return_exceptions=True)
|
| 75 |
-
```
|
| 76 |
-
|
| 77 |
-
**Problem**: They built new IterativeResearchFlow instead of reusing MagenticOrchestrator.
|
| 78 |
-
**Our implementation**: Just run multiple MagenticOrchestrator instances.
|
| 79 |
-
|
| 80 |
-
#### 4. LlamaIndex RAG (`services/llamaindex_rag.py` - 454 lines)
|
| 81 |
-
|
| 82 |
-
**Idea**: Semantic search over collected evidence.
|
| 83 |
-
|
| 84 |
-
```python
|
| 85 |
-
# Their approach
|
| 86 |
-
class LlamaIndexRAGService:
|
| 87 |
-
def __init__(self):
|
| 88 |
-
# ChromaDB + LlamaIndex + HuggingFace embeddings
|
| 89 |
-
self.vector_store = ChromaVectorStore(...)
|
| 90 |
-
self.index = VectorStoreIndex(...)
|
| 91 |
-
|
| 92 |
-
def retrieve(self, query: str, top_k: int = 5) -> list[dict]:
|
| 93 |
-
retriever = VectorIndexRetriever(index=self.index, similarity_top_k=top_k)
|
| 94 |
-
return retriever.retrieve(query)
|
| 95 |
-
```
|
| 96 |
-
|
| 97 |
-
**Good**: Full-featured RAG with multiple embedding providers.
|
| 98 |
-
**Simpler alternative**: Direct ChromaDB + sentence-transformers (no LlamaIndex).
|
| 99 |
-
**Our implementation**: See Phase 4 in DEEP_RESEARCH_ROADMAP.md
|
| 100 |
-
|
| 101 |
-
#### 5. LongWriterAgent (`agents/long_writer.py` - ~300 lines)
|
| 102 |
-
|
| 103 |
-
**Idea**: Write reports section-by-section to handle length.
|
| 104 |
-
|
| 105 |
-
```python
|
| 106 |
-
class SectionOutput(BaseModel):
|
| 107 |
-
section_content: str
|
| 108 |
-
references: list[str]
|
| 109 |
-
next_section_context: str # What to avoid repeating
|
| 110 |
-
|
| 111 |
-
async def write_next_section(
|
| 112 |
-
section_title: str,
|
| 113 |
-
findings: str,
|
| 114 |
-
previous_sections: str, # Avoid repetition
|
| 115 |
-
) -> SectionOutput:
|
| 116 |
-
```
|
| 117 |
-
|
| 118 |
-
**Good pattern**: Passing context to avoid repetition.
|
| 119 |
-
**Our implementation**: See Phase 5 in DEEP_RESEARCH_ROADMAP.md
|
| 120 |
-
|
| 121 |
-
#### 6. ProofreaderAgent (`agents/proofreader.py` - ~200 lines)
|
| 122 |
-
|
| 123 |
-
**Idea**: Final cleanup pass on report.
|
| 124 |
-
|
| 125 |
-
```python
|
| 126 |
-
# Tasks:
|
| 127 |
-
# 1. Remove duplicate information
|
| 128 |
-
# 2. Fix citation numbering
|
| 129 |
-
# 3. Add executive summary
|
| 130 |
-
# 4. Ensure consistent formatting
|
| 131 |
-
```
|
| 132 |
-
|
| 133 |
-
**Good pattern**: Separate concerns - writer writes, proofreader polishes.
|
| 134 |
-
**Our implementation**: Optional Phase 6 if needed.
|
| 135 |
-
|
| 136 |
-
### Graph Architecture (Educational Reference)
|
| 137 |
-
|
| 138 |
-
The graph system is well-designed in theory:
|
| 139 |
-
|
| 140 |
-
```python
|
| 141 |
-
# Node types
|
| 142 |
-
class AgentNode(GraphNode):
|
| 143 |
-
agent: Any # Pydantic AI agent
|
| 144 |
-
input_transformer: Callable # Transform input
|
| 145 |
-
output_transformer: Callable # Transform output
|
| 146 |
-
|
| 147 |
-
class DecisionNode(GraphNode):
|
| 148 |
-
decision_function: Callable[[Any], str] # Returns next node ID
|
| 149 |
-
options: list[str]
|
| 150 |
-
|
| 151 |
-
class ParallelNode(GraphNode):
|
| 152 |
-
parallel_nodes: list[str] # Run these in parallel
|
| 153 |
-
aggregator: Callable # Combine results
|
| 154 |
-
|
| 155 |
-
# Graph structure
|
| 156 |
-
class ResearchGraph:
|
| 157 |
-
nodes: dict[str, GraphNode]
|
| 158 |
-
edges: dict[str, list[GraphEdge]]
|
| 159 |
-
entry_node: str
|
| 160 |
-
exit_nodes: list[str]
|
| 161 |
-
```
|
| 162 |
-
|
| 163 |
-
**Why we don't need it**: MagenticBuilder already provides:
|
| 164 |
-
- Agent coordination via manager
|
| 165 |
-
- Conditional routing (manager decides)
|
| 166 |
-
- Multiple participants
|
| 167 |
-
|
| 168 |
-
This is essentially reimplementing what `agent-framework` already does.
|
| 169 |
-
|
| 170 |
-
## Key Lessons
|
| 171 |
-
|
| 172 |
-
### What Went Wrong
|
| 173 |
-
|
| 174 |
-
1. **Parallel architecture** - Built new system instead of extending existing
|
| 175 |
-
2. **Horizontal sprawl** - All infrastructure, nothing wired in
|
| 176 |
-
3. **Test mocking** - Tests don't mock API clients properly
|
| 177 |
-
4. **No manual testing** - Code never ran end-to-end
|
| 178 |
-
|
| 179 |
-
### What To Learn From
|
| 180 |
-
|
| 181 |
-
1. **Pydantic models for structured output** - Good pattern
|
| 182 |
-
2. **Heuristic fallbacks** - When LLM fails, have a fallback
|
| 183 |
-
3. **Section-by-section writing** - For long reports
|
| 184 |
-
4. **RAG for evidence retrieval** - Useful for large evidence sets
|
| 185 |
-
|
| 186 |
-
### The 7,000 Line vs 500 Line Comparison
|
| 187 |
-
|
| 188 |
-
**Their approach**:
|
| 189 |
-
- New GraphOrchestrator (974 lines)
|
| 190 |
-
- New ResearchFlow (999 lines)
|
| 191 |
-
- New BudgetTracker (391 lines)
|
| 192 |
-
- New StateMachine (130 lines)
|
| 193 |
-
- New WorkflowManager (300 lines)
|
| 194 |
-
- New agents (InputParser, Writer, LongWriter, Proofreader, etc.)
|
| 195 |
-
- Total: ~7,000 lines, not integrated
|
| 196 |
-
|
| 197 |
-
**Our approach**:
|
| 198 |
-
- InputParser (50-100 lines) - extends existing
|
| 199 |
-
- PlannerAgent (80-120 lines) - uses ChatAgent pattern
|
| 200 |
-
- DeepOrchestrator (100-150 lines) - wraps MagenticOrchestrator
|
| 201 |
-
- RAGService (100-150 lines) - simple ChromaDB
|
| 202 |
-
- LongWriter (80-100 lines) - extends ReportAgent
|
| 203 |
-
- Total: ~500 lines, each phase ships working
|
| 204 |
-
|
| 205 |
-
## File Locations (for reference)
|
| 206 |
-
|
| 207 |
-
```
|
| 208 |
-
reference_repos/GradioDemo/src/
|
| 209 |
-
βββ orchestrator/
|
| 210 |
-
β βββ graph_orchestrator.py # 974 lines - graph execution
|
| 211 |
-
β βββ research_flow.py # 999 lines - iterative/deep flows
|
| 212 |
-
β βββ planner_agent.py # 184 lines - section planning
|
| 213 |
-
βββ agents/
|
| 214 |
-
β βββ input_parser.py # 179 lines - query analysis
|
| 215 |
-
β βββ writer.py # 210 lines - report writing
|
| 216 |
-
β βββ long_writer.py # ~300 lines - section writing
|
| 217 |
-
β βββ proofreader.py # ~200 lines - cleanup
|
| 218 |
-
β βββ knowledge_gap.py # gap detection
|
| 219 |
-
βββ middleware/
|
| 220 |
-
β βββ budget_tracker.py # 391 lines - token/time tracking
|
| 221 |
-
β βββ state_machine.py # 130 lines - workflow state
|
| 222 |
-
β βββ workflow_manager.py # 300 lines - parallel loop mgmt
|
| 223 |
-
βββ services/
|
| 224 |
-
β βββ llamaindex_rag.py # 454 lines - RAG service
|
| 225 |
-
βββ tools/
|
| 226 |
-
β βββ rag_tool.py # 191 lines - RAG as search tool
|
| 227 |
-
βββ agent_factory/
|
| 228 |
-
βββ graph_builder.py # ~400 lines - graph construction
|
| 229 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|