Implementation Roadmap: DeepCritical (Vertical Slices)
Philosophy: AI-Native Engineering, Vertical Slice Architecture, TDD, Modern Tooling (2025).
This roadmap defines the execution strategy to deliver DeepCritical effectively. We reject "overplanning" in favor of ironclad, testable vertical slices. Each phase delivers a fully functional slice of end-to-end value.
Total Estimated Effort: 12-16 hours (can be done in 4 days)
π οΈ The 2025 "Gucci" Tooling Stack
We are using the bleeding edge of Python engineering to ensure speed, safety, and developer joy.
| Category | Tool | Why? |
|---|---|---|
| Package Manager | uv |
Rust-based, 10-100x faster than pip/poetry. Manages python versions, venvs, and deps. |
| Linting/Format | ruff |
Rust-based, instant. Replaces black, isort, flake8. |
| Type Checking | mypy |
Strict static typing. Run via uv run mypy. |
| Testing | pytest |
The standard. |
| Test Plugins | pytest-sugar |
Instant feedback, progress bars. "Gucci" visuals. |
| Test Plugins | pytest-asyncio |
Essential for our async agent loop. |
| Test Plugins | pytest-cov |
Coverage reporting to ensure TDD adherence. |
| Test Plugins | pytest-mock |
Easy mocking with mocker fixture. |
| HTTP Mocking | respx |
Mock httpx requests in tests. |
| Git Hooks | pre-commit |
Enforce ruff/mypy before commit. |
| Retry Logic | tenacity |
Exponential backoff for API calls. |
| Logging | structlog |
Structured JSON logging. |
ποΈ Architecture: Vertical Slices
Instead of horizontal layers (e.g., "Building the Database Layer"), we build Vertical Slices. Each slice implements a feature from Entry Point (UI/API) β Logic β Data/External.
Directory Structure (Maintainer's Template + Our Code)
We use the existing scaffolding from the maintainer, filling in the empty files.
deepcritical/
βββ pyproject.toml # All config in one file
βββ .env.example # Environment template
βββ .pre-commit-config.yaml # Git hooks
βββ Dockerfile # Container build
βββ README.md # HuggingFace Space config
β
βββ src/
β βββ app.py # Gradio entry point
β βββ orchestrator.py # Main agent loop (SearchβJudgeβSynthesize)
β β
β βββ agent_factory/ # Agent definitions
β β βββ __init__.py
β β βββ agents.py # (Reserved for future agents)
β β βββ judges.py # JudgeHandler - LLM evidence assessment
β β
β βββ tools/ # Search tools
β β βββ __init__.py
β β βββ pubmed.py # PubMedTool - NCBI E-utilities
β β βββ websearch.py # WebTool - DuckDuckGo
β β βββ search_handler.py # SearchHandler - orchestrates tools
β β
β βββ prompts/ # Prompt templates
β β βββ __init__.py
β β βββ judge.py # Judge system/user prompts
β β
β βββ utils/ # Shared utilities
β β βββ __init__.py
β β βββ config.py # Settings via pydantic-settings
β β βββ exceptions.py # Custom exceptions
β β βββ models.py # ALL Pydantic models (Evidence, JudgeAssessment, etc.)
β β
β βββ middleware/ # (Empty - reserved)
β βββ database_services/ # (Empty - reserved)
β βββ retrieval_factory/ # (Empty - reserved)
β
βββ tests/
βββ __init__.py
βββ conftest.py # Shared fixtures
β
βββ unit/ # Fast, mocked tests
β βββ __init__.py
β βββ utils/ # Config, models tests
β βββ tools/ # PubMed, WebSearch tests
β βββ agent_factory/ # Judge tests
β
βββ integration/ # Real API tests (optional)
βββ __init__.py
π Phased Execution Plan
Phase 1: Foundation & Tooling (~2-3 hours)
Goal: A rock-solid, CI-ready environment with uv and pytest configured.
| Task | Output |
|---|---|
| Install uv | uv --version works |
| Create pyproject.toml | All deps + config in one file |
| Set up directory structure | All __init__.py files created |
| Configure ruff + mypy | Strict settings |
| Create conftest.py | Shared pytest fixtures |
| Implement shared/config.py | Settings via pydantic-settings |
| Write first test | test_config.py passes |
Deliverable: uv run pytest passes with green output.
π Spec Document: 01_phase_foundation.md
Phase 2: The "Search" Vertical Slice (~3-4 hours)
Goal: Agent can receive a query and get raw results from PubMed/Web.
| Task | Output |
|---|---|
| Define Evidence/Citation models | Pydantic models |
| Implement PubMedTool | ESearch β EFetch β Evidence |
| Implement WebTool | DuckDuckGo β Evidence |
| Implement SearchHandler | Parallel search orchestration |
| Write unit tests | Mocked HTTP responses |
Deliverable: Function that takes "long covid" β returns List[Evidence].
π Spec Document: 02_phase_search.md
Phase 3: The "Judge" Vertical Slice (~3-4 hours)
Goal: Agent can decide if evidence is sufficient.
| Task | Output |
|---|---|
| Define JudgeAssessment model | Structured output schema |
| Write prompt templates | System + user prompts |
| Implement JudgeHandler | PydanticAI agent with structured output |
| Write unit tests | Mocked LLM responses |
Deliverable: Function that takes List[Evidence] β returns JudgeAssessment.
π Spec Document: 03_phase_judge.md
Phase 4: The "Orchestrator" & UI Slice (~4-5 hours)
Goal: End-to-End User Value.
| Task | Output |
|---|---|
| Define AgentEvent/State models | Event streaming types |
| Implement Orchestrator | Main while loop connecting SearchβJudge |
| Implement report synthesis | Generate markdown report |
| Build Gradio UI | Streaming chat interface |
| Create Dockerfile | Container for deployment |
| Create HuggingFace README | Space configuration |
| Write unit tests | Mocked handlers |
Deliverable: Working DeepCritical Agent on localhost:7860.
π Spec Document: 04_phase_ui.md
π Spec Documents Summary
| Phase | Document | Focus |
|---|---|---|
| 1 | 01_phase_foundation.md | Tooling, config, TDD setup |
| 2 | 02_phase_search.md | PubMed + DuckDuckGo search |
| 3 | 03_phase_judge.md | LLM evidence assessment |
| 4 | 04_phase_ui.md | Orchestrator + Gradio + Deploy |
β‘ Quick Start Commands
# Phase 1: Setup
curl -LsSf https://astral.sh/uv/install.sh | sh
uv init --name deepcritical
uv sync --all-extras
uv run pytest
# Phase 2-4: Development
uv run pytest tests/unit/ -v # Run unit tests
uv run ruff check src tests # Lint
uv run mypy src # Type check
uv run python src/app.py # Run Gradio locally
# Deployment
docker build -t deepcritical .
docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... deepcritical
π― Definition of Done (MVP)
The MVP is COMPLETE when:
- β
All unit tests pass (
uv run pytest) - β
Ruff has no errors (
uv run ruff check) - β
Mypy has no errors (
uv run mypy src) - β
Gradio UI runs locally (
uv run python src/app.py) - β Can ask "Can metformin treat Alzheimer's?" and get a report
- β Report includes drug candidates, citations, and quality scores
- β Docker builds successfully
- β Deployable to HuggingFace Spaces
π Progress Tracker
| Phase | Status | Tests | Notes |
|---|---|---|---|
| 1: Foundation | β¬ Pending | 0/5 | Start here |
| 2: Search | β¬ Pending | 0/6 | Depends on Phase 1 |
| 3: Judge | β¬ Pending | 0/5 | Depends on Phase 2 |
| 4: Orchestrator | β¬ Pending | 0/4 | Depends on Phase 3 |
Update this table as you complete each phase!
Start by reading Phase 1 Spec to initialize the repo.