VibecoderMcSwaggins's picture
docs: enhance implementation documentation for Phase 4 Orchestrator and UI
20ba79b
|
raw
history blame
8.44 kB
# Implementation Roadmap: DeepCritical (Vertical Slices)
**Philosophy:** AI-Native Engineering, Vertical Slice Architecture, TDD, Modern Tooling (2025).
This roadmap defines the execution strategy to deliver **DeepCritical** effectively. We reject "overplanning" in favor of **ironclad, testable vertical slices**. Each phase delivers a fully functional slice of end-to-end value.
**Total Estimated Effort**: 12-16 hours (can be done in 4 days)
---
## πŸ› οΈ The 2025 "Gucci" Tooling Stack
We are using the bleeding edge of Python engineering to ensure speed, safety, and developer joy.
| Category | Tool | Why? |
|----------|------|------|
| **Package Manager** | **`uv`** | Rust-based, 10-100x faster than pip/poetry. Manages python versions, venvs, and deps. |
| **Linting/Format** | **`ruff`** | Rust-based, instant. Replaces black, isort, flake8. |
| **Type Checking** | **`mypy`** | Strict static typing. Run via `uv run mypy`. |
| **Testing** | **`pytest`** | The standard. |
| **Test Plugins** | **`pytest-sugar`** | Instant feedback, progress bars. "Gucci" visuals. |
| **Test Plugins** | **`pytest-asyncio`** | Essential for our async agent loop. |
| **Test Plugins** | **`pytest-cov`** | Coverage reporting to ensure TDD adherence. |
| **Test Plugins** | **`pytest-mock`** | Easy mocking with `mocker` fixture. |
| **HTTP Mocking** | **`respx`** | Mock `httpx` requests in tests. |
| **Git Hooks** | **`pre-commit`** | Enforce ruff/mypy before commit. |
| **Retry Logic** | **`tenacity`** | Exponential backoff for API calls. |
| **Logging** | **`structlog`** | Structured JSON logging. |
---
## πŸ—οΈ Architecture: Vertical Slices
Instead of horizontal layers (e.g., "Building the Database Layer"), we build **Vertical Slices**.
Each slice implements a feature from **Entry Point (UI/API) β†’ Logic β†’ Data/External**.
### Directory Structure (Maintainer's Template + Our Code)
We use the **existing scaffolding** from the maintainer, filling in the empty files.
```
deepcritical/
β”œβ”€β”€ pyproject.toml # All config in one file
β”œβ”€β”€ .env.example # Environment template
β”œβ”€β”€ .pre-commit-config.yaml # Git hooks
β”œβ”€β”€ Dockerfile # Container build
β”œβ”€β”€ README.md # HuggingFace Space config
β”‚
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ app.py # Gradio entry point
β”‚ β”œβ”€β”€ orchestrator.py # Main agent loop (Searchβ†’Judgeβ†’Synthesize)
β”‚ β”‚
β”‚ β”œβ”€β”€ agent_factory/ # Agent definitions
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ β”œβ”€β”€ agents.py # (Reserved for future agents)
β”‚ β”‚ └── judges.py # JudgeHandler - LLM evidence assessment
β”‚ β”‚
β”‚ β”œβ”€β”€ tools/ # Search tools
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ β”œβ”€β”€ pubmed.py # PubMedTool - NCBI E-utilities
β”‚ β”‚ β”œβ”€β”€ websearch.py # WebTool - DuckDuckGo
β”‚ β”‚ └── search_handler.py # SearchHandler - orchestrates tools
β”‚ β”‚
β”‚ β”œβ”€β”€ prompts/ # Prompt templates
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ └── judge.py # Judge system/user prompts
β”‚ β”‚
β”‚ β”œβ”€β”€ utils/ # Shared utilities
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ β”œβ”€β”€ config.py # Settings via pydantic-settings
β”‚ β”‚ β”œβ”€β”€ exceptions.py # Custom exceptions
β”‚ β”‚ └── models.py # ALL Pydantic models (Evidence, JudgeAssessment, etc.)
β”‚ β”‚
β”‚ β”œβ”€β”€ middleware/ # (Empty - reserved)
β”‚ β”œβ”€β”€ database_services/ # (Empty - reserved)
β”‚ └── retrieval_factory/ # (Empty - reserved)
β”‚
└── tests/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ conftest.py # Shared fixtures
β”‚
β”œβ”€β”€ unit/ # Fast, mocked tests
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ utils/ # Config, models tests
β”‚ β”œβ”€β”€ tools/ # PubMed, WebSearch tests
β”‚ └── agent_factory/ # Judge tests
β”‚
└── integration/ # Real API tests (optional)
└── __init__.py
```
---
## πŸš€ Phased Execution Plan
### **Phase 1: Foundation & Tooling (~2-3 hours)**
*Goal: A rock-solid, CI-ready environment with `uv` and `pytest` configured.*
| Task | Output |
|------|--------|
| Install uv | `uv --version` works |
| Create pyproject.toml | All deps + config in one file |
| Set up directory structure | All `__init__.py` files created |
| Configure ruff + mypy | Strict settings |
| Create conftest.py | Shared pytest fixtures |
| Implement shared/config.py | Settings via pydantic-settings |
| Write first test | `test_config.py` passes |
**Deliverable**: `uv run pytest` passes with green output.
πŸ“„ **Spec Document**: [01_phase_foundation.md](01_phase_foundation.md)
---
### **Phase 2: The "Search" Vertical Slice (~3-4 hours)**
*Goal: Agent can receive a query and get raw results from PubMed/Web.*
| Task | Output |
|------|--------|
| Define Evidence/Citation models | Pydantic models |
| Implement PubMedTool | ESearch β†’ EFetch β†’ Evidence |
| Implement WebTool | DuckDuckGo β†’ Evidence |
| Implement SearchHandler | Parallel search orchestration |
| Write unit tests | Mocked HTTP responses |
**Deliverable**: Function that takes "long covid" β†’ returns `List[Evidence]`.
πŸ“„ **Spec Document**: [02_phase_search.md](02_phase_search.md)
---
### **Phase 3: The "Judge" Vertical Slice (~3-4 hours)**
*Goal: Agent can decide if evidence is sufficient.*
| Task | Output |
|------|--------|
| Define JudgeAssessment model | Structured output schema |
| Write prompt templates | System + user prompts |
| Implement JudgeHandler | PydanticAI agent with structured output |
| Write unit tests | Mocked LLM responses |
**Deliverable**: Function that takes `List[Evidence]` β†’ returns `JudgeAssessment`.
πŸ“„ **Spec Document**: [03_phase_judge.md](03_phase_judge.md)
---
### **Phase 4: The "Orchestrator" & UI Slice (~4-5 hours)**
*Goal: End-to-End User Value.*
| Task | Output |
|------|--------|
| Define AgentEvent/State models | Event streaming types |
| Implement Orchestrator | Main while loop connecting Search→Judge |
| Implement report synthesis | Generate markdown report |
| Build Gradio UI | Streaming chat interface |
| Create Dockerfile | Container for deployment |
| Create HuggingFace README | Space configuration |
| Write unit tests | Mocked handlers |
**Deliverable**: Working DeepCritical Agent on localhost:7860.
πŸ“„ **Spec Document**: [04_phase_ui.md](04_phase_ui.md)
---
## πŸ“œ Spec Documents Summary
| Phase | Document | Focus |
|-------|----------|-------|
| 1 | [01_phase_foundation.md](01_phase_foundation.md) | Tooling, config, TDD setup |
| 2 | [02_phase_search.md](02_phase_search.md) | PubMed + DuckDuckGo search |
| 3 | [03_phase_judge.md](03_phase_judge.md) | LLM evidence assessment |
| 4 | [04_phase_ui.md](04_phase_ui.md) | Orchestrator + Gradio + Deploy |
---
## ⚑ Quick Start Commands
```bash
# Phase 1: Setup
curl -LsSf https://astral.sh/uv/install.sh | sh
uv init --name deepcritical
uv sync --all-extras
uv run pytest
# Phase 2-4: Development
uv run pytest tests/unit/ -v # Run unit tests
uv run ruff check src tests # Lint
uv run mypy src # Type check
uv run python src/app.py # Run Gradio locally
# Deployment
docker build -t deepcritical .
docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... deepcritical
```
---
## 🎯 Definition of Done (MVP)
The MVP is **COMPLETE** when:
1. βœ… All unit tests pass (`uv run pytest`)
2. βœ… Ruff has no errors (`uv run ruff check`)
3. βœ… Mypy has no errors (`uv run mypy src`)
4. βœ… Gradio UI runs locally (`uv run python src/app.py`)
5. βœ… Can ask "Can metformin treat Alzheimer's?" and get a report
6. βœ… Report includes drug candidates, citations, and quality scores
7. βœ… Docker builds successfully
8. βœ… Deployable to HuggingFace Spaces
---
## πŸ“Š Progress Tracker
| Phase | Status | Tests | Notes |
|-------|--------|-------|-------|
| 1: Foundation | ⬜ Pending | 0/5 | Start here |
| 2: Search | ⬜ Pending | 0/6 | Depends on Phase 1 |
| 3: Judge | ⬜ Pending | 0/5 | Depends on Phase 2 |
| 4: Orchestrator | ⬜ Pending | 0/4 | Depends on Phase 3 |
Update this table as you complete each phase!
---
*Start by reading [Phase 1 Spec](01_phase_foundation.md) to initialize the repo.*