DeepBoner / docs /architecture /component-inventory.md
VibecoderMcSwaggins's picture
docs: Audit and fix architecture documentation for accuracy
c7a2e77
|
raw
history blame
13 kB
# Component Inventory
> **Last Updated**: 2025-12-06
This document provides a complete catalog of all components in the DeepBoner codebase.
## Source Code Statistics
| Category | Count |
|----------|-------|
| Python files in `src/` | ~67 |
| Python files in `tests/` | ~76 |
| Total modules | ~143 |
## Directory Structure
```
src/
β”œβ”€β”€ app.py # Gradio UI entry point
β”œβ”€β”€ mcp_tools.py # MCP server tool wrappers
β”œβ”€β”€ orchestrators/ # Research orchestration
β”œβ”€β”€ clients/ # LLM backend adapters
β”œβ”€β”€ agents/ # Multi-agent components
β”œβ”€β”€ agent_factory/ # Agent creation
β”œβ”€β”€ tools/ # Search tool implementations
β”œβ”€β”€ services/ # Cross-cutting services
β”œβ”€β”€ prompts/ # LLM prompt templates
β”œβ”€β”€ utils/ # Shared utilities
β”œβ”€β”€ config/ # Domain configuration
β”œβ”€β”€ middleware/ # Processing middleware
└── state/ # State management
```
---
## Core Entry Points
### `src/app.py`
**Purpose:** Main application entry point
| Component | Type | Description |
|-----------|------|-------------|
| `create_demo()` | Function | Creates Gradio interface |
| `main()` | Function | Application entry point |
**Dependencies:** Gradio, orchestrators, config
### `src/mcp_tools.py`
**Purpose:** MCP (Model Context Protocol) tool wrappers
| Component | Type | Description |
|-----------|------|-------------|
| `search_pubmed()` | Tool | PubMed search wrapper |
| `search_clinical_trials()` | Tool | ClinicalTrials.gov wrapper |
| `search_europepmc()` | Tool | Europe PMC wrapper |
| `search_all_sources()` | Tool | Multi-source search |
---
## Orchestrators (`src/orchestrators/`)
### `advanced.py`
**Purpose:** Main multi-agent orchestrator using Microsoft Agent Framework
| Component | Type | Description |
|-----------|------|-------------|
| `AdvancedOrchestrator` | Class | Primary research orchestrator |
| `run()` | Method | Execute research workflow |
| `_search_phase()` | Method | Search execution |
| `_judge_phase()` | Method | Evidence evaluation |
| `_synthesize_phase()` | Method | Report generation |
**Framework:** Microsoft Agent Framework (agent-framework-core)
### `factory.py`
**Purpose:** Orchestrator selection
| Component | Type | Description |
|-----------|------|-------------|
| `OrchestratorFactory` | Class | Creates appropriate orchestrator |
| `create()` | Method | Factory method |
### `base.py`
**Purpose:** Base orchestrator interface
| Component | Type | Description |
|-----------|------|-------------|
| `BaseOrchestrator` | ABC | Abstract base class |
### `langgraph_orchestrator.py`
**Purpose:** LangGraph-based workflow (experimental)
| Component | Type | Description |
|-----------|------|-------------|
| `LangGraphOrchestrator` | Class | Workflow state machine |
### `hierarchical.py`
**Purpose:** Hierarchical agent coordination
| Component | Type | Description |
|-----------|------|-------------|
| `HierarchicalOrchestrator` | Class | Manager-agent hierarchy |
---
## LLM Clients (`src/clients/`)
### `factory.py`
**Purpose:** Auto-select LLM backend
| Component | Type | Description |
|-----------|------|-------------|
| `get_chat_client()` | Function | Returns appropriate client |
**Selection Logic:**
```python
if settings.has_openai_key:
return OpenAIChatClient()
else:
return HuggingFaceChatClient()
```
### `huggingface.py`
**Purpose:** HuggingFace Inference API adapter
| Component | Type | Description |
|-----------|------|-------------|
| `HuggingFaceChatClient` | Class | Free tier LLM client |
| `chat_completion()` | Method | Generate completion |
**Model:** Qwen 2.5 7B Instruct (free tier)
### `base.py`
**Purpose:** Client interface
| Component | Type | Description |
|-----------|------|-------------|
| `BaseChatClient` | ABC | Client interface |
### `providers.py`
**Purpose:** Provider implementations
### `registry.py`
**Purpose:** Provider registration
---
## Agents (`src/agents/`)
### `search_agent.py`
| Component | Type | Description |
|-----------|------|-------------|
| `SearchAgent` | Class | Evidence gathering agent |
### `judge_agent.py`
| Component | Type | Description |
|-----------|------|-------------|
| `JudgeAgent` | Class | Evidence evaluation |
### `judge_agent_llm.py`
| Component | Type | Description |
|-----------|------|-------------|
| `LLMJudgeAgent` | Class | LLM-based judge implementation |
### `report_agent.py`
| Component | Type | Description |
|-----------|------|-------------|
| `ReportAgent` | Class | Report synthesis |
### `retrieval_agent.py`
| Component | Type | Description |
|-----------|------|-------------|
| `create_retrieval_agent()` | Factory | Creates ChatAgent for web search |
| `search_web` | @ai_function | DuckDuckGo web search tool |
> **Note:** This module is implemented but NOT wired into `magentic_agents.py`. See GitHub issue #134.
### `hypothesis_agent.py`
| Component | Type | Description |
|-----------|------|-------------|
| `HypothesisAgent` | Class | Mechanistic hypothesis generation |
### `magentic_agents.py`
| Component | Type | Description |
|-----------|------|-------------|
| Multi-agent mode | Module | Microsoft Agent Framework integration |
### `state.py`
| Component | Type | Description |
|-----------|------|-------------|
| Agent state models | Module | Shared state definitions |
### `tools.py`
| Component | Type | Description |
|-----------|------|-------------|
| Tool bindings | Module | Agent tool configuration |
---
## Graph Workflow (`src/agents/graph/`)
### `workflow.py`
| Component | Type | Description |
|-----------|------|-------------|
| `create_workflow()` | Function | LangGraph workflow builder |
### `nodes.py`
| Component | Type | Description |
|-----------|------|-------------|
| `search_node()` | Function | Search workflow node |
| `judge_node()` | Function | Judge workflow node |
| `report_node()` | Function | Report workflow node |
### `state.py`
| Component | Type | Description |
|-----------|------|-------------|
| `WorkflowState` | Class | LangGraph state schema |
---
## Agent Factory (`src/agent_factory/`)
### `judges.py`
**Purpose:** Evidence quality judgment
| Component | Type | Description |
|-----------|------|-------------|
| `create_judge()` | Function | Judge agent factory |
| `JudgeResult` | Model | Assessment output |
**Framework:** Pydantic AI
### `agents.py`
| Component | Type | Description |
|-----------|------|-------------|
| Agent creation | Module | Factory functions |
---
## Search Tools (`src/tools/`)
### `pubmed.py`
| Component | Type | Description |
|-----------|------|-------------|
| `PubMedTool` | Class | NCBI E-utilities client |
| `search()` | Method | Execute search |
**API:** PubMed E-utilities (eutils.ncbi.nlm.nih.gov)
### `clinicaltrials.py`
| Component | Type | Description |
|-----------|------|-------------|
| `ClinicalTrialsTool` | Class | ClinicalTrials.gov client |
| `search()` | Method | Execute search |
**API:** ClinicalTrials.gov API (uses `requests` due to WAF blocking httpx)
### `europepmc.py`
| Component | Type | Description |
|-----------|------|-------------|
| `EuropePMCTool` | Class | Europe PMC client |
| `search()` | Method | Execute search |
**API:** Europe PMC API
### `openalex.py`
| Component | Type | Description |
|-----------|------|-------------|
| `OpenAlexTool` | Class | OpenAlex client |
| `search()` | Method | Execute search |
**API:** OpenAlex API
### `search_handler.py`
| Component | Type | Description |
|-----------|------|-------------|
| `SearchHandler` | Class | Scatter-gather orchestration |
| `search_all()` | Method | Parallel multi-source search |
### `query_utils.py`
| Component | Type | Description |
|-----------|------|-------------|
| Query utilities | Module | Query refinement and expansion |
### `rate_limiter.py`
| Component | Type | Description |
|-----------|------|-------------|
| `RateLimiter` | Class | API rate limiting |
### `base.py`
| Component | Type | Description |
|-----------|------|-------------|
| `BaseSearchTool` | ABC | Search tool interface |
### `web_search.py`
| Component | Type | Description |
|-----------|------|-------------|
| `WebSearchTool` | Class | DuckDuckGo integration wrapper |
> **Note:** Used by `search_web` in `retrieval_agent.py`. See GitHub issue #134 for dead code status.
---
## Services (`src/services/`)
### `embeddings.py`
| Component | Type | Description |
|-----------|------|-------------|
| `EmbeddingService` | Class | Local embedding service |
| `embed()` | Method | Generate embeddings |
| `deduplicate()` | Method | Cross-source deduplication |
**Stack:** sentence-transformers + ChromaDB
### `llamaindex_rag.py`
| Component | Type | Description |
|-----------|------|-------------|
| `LlamaIndexRAG` | Class | Premium RAG service |
**Stack:** LlamaIndex + OpenAI embeddings + ChromaDB
### `embedding_protocol.py`
| Component | Type | Description |
|-----------|------|-------------|
| `EmbeddingProtocol` | Protocol | Interface for embedding services |
### `research_memory.py`
| Component | Type | Description |
|-----------|------|-------------|
| `ResearchMemory` | Class | Shared research state |
---
## Utilities (`src/utils/`)
### `config.py`
| Component | Type | Description |
|-----------|------|-------------|
| `Settings` | Class | Pydantic Settings configuration |
| `settings` | Instance | Global settings singleton |
| `get_settings()` | Function | Settings factory |
| `configure_logging()` | Function | Logging setup |
### `models.py`
| Component | Type | Description |
|-----------|------|-------------|
| `Evidence` | Model | Evidence with citation |
| `Citation` | Model | Source citation |
| `SearchResult` | Model | Search response |
| `JudgeAssessment` | Model | Judge evaluation |
| `ResearchReport` | Model | Final report |
| `AgentEvent` | Model | UI streaming events |
See [Data Models](data-models.md) for complete documentation.
### `exceptions.py`
| Component | Type | Description |
|-----------|------|-------------|
| `DeepBonerError` | Exception | Base exception |
| `SearchError` | Exception | Search failures |
| `JudgeError` | Exception | Judge failures |
| `ConfigurationError` | Exception | Config errors |
| `RateLimitError` | Exception | Rate limits |
See [Exception Hierarchy](exception-hierarchy.md) for details.
### `service_loader.py`
| Component | Type | Description |
|-----------|------|-------------|
| Service loading | Module | Tiered service selection |
### `citation_validator.py`
| Component | Type | Description |
|-----------|------|-------------|
| Citation validation | Module | URL verification |
### `text_utils.py`
| Component | Type | Description |
|-----------|------|-------------|
| Text utilities | Module | Text processing |
### `parsers.py`
| Component | Type | Description |
|-----------|------|-------------|
| Response parsing | Module | LLM output parsing |
### `dataloaders.py`
| Component | Type | Description |
|-----------|------|-------------|
| Data loading | Module | Data loading utilities |
---
## Configuration (`src/config/`)
### `domain.py`
| Component | Type | Description |
|-----------|------|-------------|
| `ResearchDomain` | Enum | Research domain types |
---
## Prompts (`src/prompts/`)
| File | Purpose |
|------|---------|
| `search.py` | Query refinement prompts |
| `judge.py` | Evidence assessment prompts |
| `hypothesis.py` | Hypothesis generation prompts |
| `synthesis.py` | Evidence synthesis prompts |
| `report.py` | Report generation prompts |
---
## Middleware (`src/middleware/`)
### `sub_iteration.py`
| Component | Type | Description |
|-----------|------|-------------|
| Sub-iteration | Module | Nested iteration logic |
---
## Reserved Directories
These directories exist but are placeholders for future features:
| Directory | Purpose |
|-----------|---------|
| `src/database_services/` | Future database services |
| `src/retrieval_factory/` | Future retrieval configuration |
---
## Test Structure
```
tests/
β”œβ”€β”€ conftest.py # Shared fixtures
β”œβ”€β”€ unit/ # Unit tests (mocked)
β”‚ β”œβ”€β”€ orchestrators/
β”‚ β”œβ”€β”€ agents/
β”‚ β”œβ”€β”€ clients/
β”‚ β”œβ”€β”€ tools/
β”‚ β”œβ”€β”€ services/
β”‚ β”œβ”€β”€ utils/
β”‚ β”œβ”€β”€ prompts/
β”‚ β”œβ”€β”€ agent_factory/
β”‚ β”œβ”€β”€ config/
β”‚ β”œβ”€β”€ graph/
β”‚ └── mcp/
β”œβ”€β”€ integration/ # Integration tests (real APIs)
└── e2e/ # End-to-end tests
```
---
## Related Documentation
- [Architecture Overview](overview.md)
- [Data Models](data-models.md)
- [Exception Hierarchy](exception-hierarchy.md)
- [System Registry](system-registry.md)