Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

App Files Files Community

DeepBoner / docs /implementation /roadmap.md

VibecoderMcSwaggins

docs: enhance implementation documentation for Phase 4 Orchestrator and UI

20ba79b 22 days ago

preview code

raw

history blame

8.44 kB

Implementation Roadmap: DeepCritical (Vertical Slices)

Philosophy: AI-Native Engineering, Vertical Slice Architecture, TDD, Modern Tooling (2025).

This roadmap defines the execution strategy to deliver DeepCritical effectively. We reject "overplanning" in favor of ironclad, testable vertical slices. Each phase delivers a fully functional slice of end-to-end value.

Total Estimated Effort: 12-16 hours (can be done in 4 days)

🛠️ The 2025 "Gucci" Tooling Stack

We are using the bleeding edge of Python engineering to ensure speed, safety, and developer joy.

Category	Tool	Why?
Package Manager	`uv`	Rust-based, 10-100x faster than pip/poetry. Manages python versions, venvs, and deps.
Linting/Format	`ruff`	Rust-based, instant. Replaces black, isort, flake8.
Type Checking	`mypy`	Strict static typing. Run via `uv run mypy`.
Testing	`pytest`	The standard.
Test Plugins	`pytest-sugar`	Instant feedback, progress bars. "Gucci" visuals.
Test Plugins	`pytest-asyncio`	Essential for our async agent loop.
Test Plugins	`pytest-cov`	Coverage reporting to ensure TDD adherence.
Test Plugins	`pytest-mock`	Easy mocking with `mocker` fixture.
HTTP Mocking	`respx`	Mock `httpx` requests in tests.
Git Hooks	`pre-commit`	Enforce ruff/mypy before commit.
Retry Logic	`tenacity`	Exponential backoff for API calls.
Logging	`structlog`	Structured JSON logging.

🏗️ Architecture: Vertical Slices

Instead of horizontal layers (e.g., "Building the Database Layer"), we build Vertical Slices. Each slice implements a feature from Entry Point (UI/API) → Logic → Data/External.

Directory Structure (Maintainer's Template + Our Code)

We use the existing scaffolding from the maintainer, filling in the empty files.

deepcritical/
├── pyproject.toml          # All config in one file
├── .env.example            # Environment template
├── .pre-commit-config.yaml # Git hooks
├── Dockerfile              # Container build
├── README.md               # HuggingFace Space config
│
├── src/
│   ├── app.py              # Gradio entry point
│   ├── orchestrator.py     # Main agent loop (Search→Judge→Synthesize)
│   │
│   ├── agent_factory/      # Agent definitions
│   │   ├── __init__.py
│   │   ├── agents.py       # (Reserved for future agents)
│   │   └── judges.py       # JudgeHandler - LLM evidence assessment
│   │
│   ├── tools/              # Search tools
│   │   ├── __init__.py
│   │   ├── pubmed.py       # PubMedTool - NCBI E-utilities
│   │   ├── websearch.py    # WebTool - DuckDuckGo
│   │   └── search_handler.py # SearchHandler - orchestrates tools
│   │
│   ├── prompts/            # Prompt templates
│   │   ├── __init__.py
│   │   └── judge.py        # Judge system/user prompts
│   │
│   ├── utils/              # Shared utilities
│   │   ├── __init__.py
│   │   ├── config.py       # Settings via pydantic-settings
│   │   ├── exceptions.py   # Custom exceptions
│   │   └── models.py       # ALL Pydantic models (Evidence, JudgeAssessment, etc.)
│   │
│   ├── middleware/         # (Empty - reserved)
│   ├── database_services/  # (Empty - reserved)
│   └── retrieval_factory/  # (Empty - reserved)
│
└── tests/
    ├── __init__.py
    ├── conftest.py         # Shared fixtures
    │
    ├── unit/               # Fast, mocked tests
    │   ├── __init__.py
    │   ├── utils/          # Config, models tests
    │   ├── tools/          # PubMed, WebSearch tests
    │   └── agent_factory/  # Judge tests
    │
    └── integration/        # Real API tests (optional)
        └── __init__.py

🚀 Phased Execution Plan

Phase 1: Foundation & Tooling (~2-3 hours)

Goal: A rock-solid, CI-ready environment with uv and pytest configured.

Task	Output
Install uv	`uv --version` works
Create pyproject.toml	All deps + config in one file
Set up directory structure	All `__init__.py` files created
Configure ruff + mypy	Strict settings
Create conftest.py	Shared pytest fixtures
Implement shared/config.py	Settings via pydantic-settings
Write first test	`test_config.py` passes

Deliverable: uv run pytest passes with green output.

📄 Spec Document: 01_phase_foundation.md

Phase 2: The "Search" Vertical Slice (~3-4 hours)

Goal: Agent can receive a query and get raw results from PubMed/Web.

Task	Output
Define Evidence/Citation models	Pydantic models
Implement PubMedTool	ESearch → EFetch → Evidence
Implement WebTool	DuckDuckGo → Evidence
Implement SearchHandler	Parallel search orchestration
Write unit tests	Mocked HTTP responses

Deliverable: Function that takes "long covid" → returns List[Evidence].

📄 Spec Document: 02_phase_search.md

Phase 3: The "Judge" Vertical Slice (~3-4 hours)

Goal: Agent can decide if evidence is sufficient.

Task	Output
Define JudgeAssessment model	Structured output schema
Write prompt templates	System + user prompts
Implement JudgeHandler	PydanticAI agent with structured output
Write unit tests	Mocked LLM responses

Deliverable: Function that takes List[Evidence] → returns JudgeAssessment.

📄 Spec Document: 03_phase_judge.md

Phase 4: The "Orchestrator" & UI Slice (~4-5 hours)

Goal: End-to-End User Value.

Task	Output
Define AgentEvent/State models	Event streaming types
Implement Orchestrator	Main while loop connecting Search→Judge
Implement report synthesis	Generate markdown report
Build Gradio UI	Streaming chat interface
Create Dockerfile	Container for deployment
Create HuggingFace README	Space configuration
Write unit tests	Mocked handlers

Deliverable: Working DeepCritical Agent on localhost:7860.

📄 Spec Document: 04_phase_ui.md

📜 Spec Documents Summary

Phase	Document	Focus
1	01_phase_foundation.md	Tooling, config, TDD setup
2	02_phase_search.md	PubMed + DuckDuckGo search
3	03_phase_judge.md	LLM evidence assessment
4	04_phase_ui.md	Orchestrator + Gradio + Deploy

⚡ Quick Start Commands

# Phase 1: Setup
curl -LsSf https://astral.sh/uv/install.sh | sh
uv init --name deepcritical
uv sync --all-extras
uv run pytest

# Phase 2-4: Development
uv run pytest tests/unit/ -v          # Run unit tests
uv run ruff check src tests           # Lint
uv run mypy src                       # Type check
uv run python src/app.py              # Run Gradio locally

# Deployment
docker build -t deepcritical .
docker run -p 7860:7860 -e OPENAI_API_KEY=sk-... deepcritical

🎯 Definition of Done (MVP)

The MVP is COMPLETE when:

✅ All unit tests pass (uv run pytest)
✅ Ruff has no errors (uv run ruff check)
✅ Mypy has no errors (uv run mypy src)
✅ Gradio UI runs locally (uv run python src/app.py)
✅ Can ask "Can metformin treat Alzheimer's?" and get a report
✅ Report includes drug candidates, citations, and quality scores
✅ Docker builds successfully
✅ Deployable to HuggingFace Spaces

📊 Progress Tracker

Phase	Status	Tests	Notes
1: Foundation	⬜ Pending	0/5	Start here
2: Search	⬜ Pending	0/6	Depends on Phase 1
3: Judge	⬜ Pending	0/5	Depends on Phase 2
4: Orchestrator	⬜ Pending	0/4	Depends on Phase 3

Update this table as you complete each phase!

Start by reading Phase 1 Spec to initialize the repo.