# Implementation Roadmap: DeepCritical (Vertical Slices) **Philosophy:** AI-Native Engineering, Vertical Slice Architecture, TDD, Modern Tooling (2025). This roadmap defines the execution strategy to deliver **DeepCritical** effectively. We reject "overplanning" in favor of **ironclad, testable vertical slices**. Each phase delivers a fully functional slice of end-to-end value. **Total Estimated Effort**: 12-16 hours (can be done in 4 days) --- ## πŸ› οΈ The 2025 "Gucci" Tooling Stack We are using the bleeding edge of Python engineering to ensure speed, safety, and developer joy. | Category | Tool | Why? | |----------|------|------| | **Package Manager** | **`uv`** | Rust-based, 10-100x faster than pip/poetry. Manages python versions, venvs, and deps. | | **Linting/Format** | **`ruff`** | Rust-based, instant. Replaces black, isort, flake8. | | **Type Checking** | **`mypy`** | Strict static typing. Run via `uv run mypy`. | | **Testing** | **`pytest`** | The standard. | | **Test Plugins** | **`pytest-sugar`** | Instant feedback, progress bars. "Gucci" visuals. | | **Test Plugins** | **`pytest-asyncio`** | Essential for our async agent loop. | | **Test Plugins** | **`pytest-cov`** | Coverage reporting to ensure TDD adherence. | | **Test Plugins** | **`pytest-mock`** | Easy mocking with `mocker` fixture. | | **HTTP Mocking** | **`respx`** | Mock `httpx` requests in tests. | | **Git Hooks** | **`pre-commit`** | Enforce ruff/mypy before commit. | | **Retry Logic** | **`tenacity`** | Exponential backoff for API calls. | | **Logging** | **`structlog`** | Structured JSON logging. | --- ## πŸ—οΈ Architecture: Vertical Slices Instead of horizontal layers (e.g., "Building the Database Layer"), we build **Vertical Slices**. Each slice implements a feature from **Entry Point (UI/API) β†’ Logic β†’ Data/External**. ### Directory Structure (Maintainer's Template + Our Code) We use the **existing scaffolding** from the maintainer, filling in the empty files. > **Note**: The maintainer created some placeholder files (`agents.py`, `code_execution.py`, > `dataloaders.py`, `parsers.py`) that are currently empty. We leave these for future use > and focus on the files needed for the MVP. ``` deepcritical/ β”œβ”€β”€ pyproject.toml # All config in one file β”œβ”€β”€ .env.example # Environment template β”œβ”€β”€ .pre-commit-config.yaml # Git hooks β”œβ”€β”€ Dockerfile # Container build β”œβ”€β”€ README.md # HuggingFace Space config β”‚ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ app.py # Gradio entry point β”‚ β”œβ”€β”€ orchestrator.py # Main agent loop (Searchβ†’Judgeβ†’Synthesize) β”‚ β”‚ β”‚ β”œβ”€β”€ agent_factory/ # Agent definitions β”‚ β”‚ β”œβ”€β”€ __init__.py β”‚ β”‚ β”œβ”€β”€ agents.py # (Maintainer placeholder - future use) β”‚ β”‚ └── judges.py # JudgeHandler - LLM evidence assessment β”‚ β”‚ β”‚ β”œβ”€β”€ tools/ # Search tools β”‚ β”‚ β”œβ”€β”€ __init__.py β”‚ β”‚ β”œβ”€β”€ pubmed.py # PubMedTool - NCBI E-utilities β”‚ β”‚ β”œβ”€β”€ websearch.py # WebTool - DuckDuckGo (replaces maintainer's empty file) β”‚ β”‚ β”œβ”€β”€ search_handler.py # SearchHandler - orchestrates tools β”‚ β”‚ └── code_execution.py # (Maintainer placeholder - future use) β”‚ β”‚ β”‚ β”œβ”€β”€ prompts/ # Prompt templates β”‚ β”‚ β”œβ”€β”€ __init__.py β”‚ β”‚ └── judge.py # Judge system/user prompts β”‚ β”‚ β”‚ β”œβ”€β”€ utils/ # Shared utilities β”‚ β”‚ β”œβ”€β”€ __init__.py β”‚ β”‚ β”œβ”€β”€ config.py # Settings via pydantic-settings β”‚ β”‚ β”œβ”€β”€ exceptions.py # Custom exceptions β”‚ β”‚ β”œβ”€β”€ models.py # ALL Pydantic models (Evidence, JudgeAssessment, etc.) β”‚ β”‚ β”œβ”€β”€ dataloaders.py # (Maintainer placeholder - future use) β”‚ β”‚ └── parsers.py # (Maintainer placeholder - future use) β”‚ β”‚ β”‚ β”œβ”€β”€ middleware/ # (Empty - reserved) β”‚ β”œβ”€β”€ database_services/ # (Empty - reserved) β”‚ └── retrieval_factory/ # (Empty - reserved) β”‚ └── tests/ β”œβ”€β”€ __init__.py β”œβ”€β”€ conftest.py # Shared fixtures β”‚ β”œβ”€β”€ unit/ # Fast, mocked tests β”‚ β”œβ”€β”€ __init__.py β”‚ β”œβ”€β”€ utils/ # Config, models tests β”‚ β”œβ”€β”€ tools/ # PubMed, WebSearch tests β”‚ └── agent_factory/ # Judge tests β”‚ └── integration/ # Real API tests (optional) └── __init__.py ``` --- ## πŸš€ Phased Execution Plan ### **Phase 1: Foundation & Tooling (~2-3 hours)** *Goal: A rock-solid, CI-ready environment with `uv` and `pytest` configured.* | Task | Output | |------|--------| | Install uv | `uv --version` works | | Create pyproject.toml | All deps + config in one file | | Set up directory structure | All `__init__.py` files created | | Configure ruff + mypy | Strict settings | | Create conftest.py | Shared pytest fixtures | | Implement utils/config.py | Settings via pydantic-settings | | Write first test | `test_config.py` passes | **Deliverable**: `uv run pytest` passes with green output. πŸ“„ **Spec Document**: [01_phase_foundation.md](01_phase_foundation.md) --- ### **Phase 2: The "Search" Vertical Slice (~3-4 hours)** *Goal: Agent can receive a query and get raw results from PubMed/Web.* | Task | Output | |------|--------| | Define Evidence/Citation models | Pydantic models | | Implement PubMedTool | ESearch β†’ EFetch β†’ Evidence | | Implement WebTool | DuckDuckGo β†’ Evidence | | Implement SearchHandler | Parallel search orchestration | | Write unit tests | Mocked HTTP responses | **Deliverable**: Function that takes "long covid" β†’ returns `List[Evidence]`. πŸ“„ **Spec Document**: [02_phase_search.md](02_phase_search.md) --- ### **Phase 3: The "Judge" Vertical Slice (~3-4 hours)** *Goal: Agent can decide if evidence is sufficient.* | Task | Output | |------|--------| | Define JudgeAssessment model | Structured output schema | | Write prompt templates | System + user prompts | | Implement JudgeHandler | PydanticAI agent with structured output | | Write unit tests | Mocked LLM responses | **Deliverable**: Function that takes `List[Evidence]` β†’ returns `JudgeAssessment`. πŸ“„ **Spec Document**: [03_phase_judge.md](03_phase_judge.md) --- ### **Phase 4: The "Orchestrator" & UI Slice (~4-5 hours)** *Goal: End-to-End User Value.* | Task | Output | |------|--------| | Define AgentEvent/State models | Event streaming types | | Implement Orchestrator | Main while loop connecting Searchβ†’Judge | | Implement report synthesis | Generate markdown report | | Build Gradio UI | Streaming chat interface | | Create Dockerfile | Container for deployment | | Create HuggingFace README | Space configuration | | Write unit tests | Mocked handlers | **Deliverable**: Working DeepCritical Agent on localhost:7860. πŸ“„ **Spec Document**: [04_phase_ui.md](04_phase_ui.md) --- ## πŸ“œ Spec Documents Summary | Phase | Document | Focus | |-------|----------|-------| | 1 | [01_phase_foundation.md](01_phase_foundation.md) | Tooling, config, TDD setup | | 2 | [02_phase_search.md](02_phase_search.md) | PubMed + DuckDuckGo search | | 3 | [03_phase_judge.md](03_phase_judge.md) | LLM evidence assessment | | 4 | [04_phase_ui.md](04_phase_ui.md) | Orchestrator + Gradio + Deploy | --- ## ⚑ Quick Start Commands ```bash # Phase 1: Setup curl -LsSf https://astral.sh/uv/install.sh | sh uv init --name deepcritical uv sync --all-extras uv run pytest # Phase 2-4: Development uv run pytest tests/unit/ -v # Run unit tests uv run ruff check src tests # Lint uv run mypy src # Type check uv run python src/app.py # Run Gradio locally # Deployment docker build -t deepcritical . docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... deepcritical ``` --- ## 🎯 Definition of Done (MVP) The MVP is **COMPLETE** when: 1. βœ… All unit tests pass (`uv run pytest`) 2. βœ… Ruff has no errors (`uv run ruff check`) 3. βœ… Mypy has no errors (`uv run mypy src`) 4. βœ… Gradio UI runs locally (`uv run python src/app.py`) 5. βœ… Can ask "Can metformin treat Alzheimer's?" and get a report 6. βœ… Report includes drug candidates, citations, and quality scores 7. βœ… Docker builds successfully 8. βœ… Deployable to HuggingFace Spaces --- ## πŸ“Š Progress Tracker | Phase | Status | Tests | Notes | |-------|--------|-------|-------| | 1: Foundation | ⬜ Pending | 0/5 | Start here | | 2: Search | ⬜ Pending | 0/6 | Depends on Phase 1 | | 3: Judge | ⬜ Pending | 0/5 | Depends on Phase 2 | | 4: Orchestrator | ⬜ Pending | 0/4 | Depends on Phase 3 | Update this table as you complete each phase! --- *Start by reading [Phase 1 Spec](01_phase_foundation.md) to initialize the repo.*