Spaces:
Sleeping
Sleeping
| # MediGuard AI β Production Upgrade Plan | |
| ## From Prototype to Production-Grade MedTech RAG System | |
| > **Generated**: 2026-02-23 | |
| > **Based on**: Deep review of production-agentic-rag-course (Weeks 1β7) + existing RagBot codebase | |
| > **Goal**: Take the existing MediGuard AI (clinical biomarker analysis + RAG explanation system) to full production quality, applying every lesson from the arXiv Paper Curator course β adapted for the MedTech domain. | |
| --- | |
| ## Table of Contents | |
| 1. [Executive Summary](#1-executive-summary) | |
| 2. [Deep Review: Course vs. Your Codebase](#2-deep-review-course-vs-your-codebase) | |
| 3. [Architecture Gap Analysis](#3-architecture-gap-analysis) | |
| 4. [Phase 1: Infrastructure Foundation](#phase-1-infrastructure-foundation-week-1-equivalent) | |
| 5. [Phase 2: Medical Data Ingestion Pipeline](#phase-2-medical-data-ingestion-pipeline-week-2-equivalent) | |
| 6. [Phase 3: Production Search Foundation](#phase-3-production-search-foundation-week-3-equivalent) | |
| 7. [Phase 4: Hybrid Search & Intelligent Chunking](#phase-4-hybrid-search--intelligent-chunking-week-4-equivalent) | |
| 8. [Phase 5: Complete RAG Pipeline with Streaming](#phase-5-complete-rag-pipeline-with-streaming-week-5-equivalent) | |
| 9. [Phase 6: Monitoring, Caching & Observability](#phase-6-monitoring-caching--observability-week-6-equivalent) | |
| 10. [Phase 7: Agentic RAG & Messaging Bot](#phase-7-agentic-rag--messaging-bot-week-7-equivalent) | |
| 11. [Phase 8: MedTech-Specific Additions](#phase-8-medtech-specific-additions-beyond-course) | |
| 12. [Implementation Priority Matrix](#implementation-priority-matrix) | |
| 13. [Migration Strategy](#migration-strategy) | |
| --- | |
| ## 1. Executive Summary | |
| Your RagBot is a **working prototype** with strong domain logic (biomarker validation, multi-agent clinical analysis, 5D evaluation, SOP evolution). The course teaches **production infrastructure** (Docker orchestration, OpenSearch hybrid search, Airflow pipelines, Redis caching, Langfuse observability, LangGraph agentic workflows, Telegram bot). | |
| **The strategy**: Keep your excellent medical domain logic and multi-agent architecture, but rebuild the infrastructure layer to match production standards. Your domain is *harder* than arXiv papers β medical data demands stricter validation, HIPAA-aware patterns, and safety guardrails. | |
| ### What You Have (Strengths) | |
| - β 6 specialized medical agents (Biomarker Analyzer, Disease Explainer, Biomarker-Disease Linker, Clinical Guidelines, Confidence Assessor, Response Synthesizer) | |
| - β LangGraph orchestration with parallel execution | |
| - β Robust biomarker validation with 24 biomarkers, reference ranges, critical values | |
| - β 5D evaluation framework (Clinical Accuracy, Evidence Grounding, Actionability, Clarity, Safety) | |
| - β SOP evolution engine (Outer Loop optimization) | |
| - β Multi-provider LLM support (Groq, Gemini, Ollama) | |
| - β Basic FastAPI with analysis endpoints | |
| - β CLI chatbot with natural language biomarker extraction | |
| ### What You're Missing (Gaps) | |
| - β No Docker Compose orchestration (only minimal single-service Dockerfile) | |
| - β No production database (PostgreSQL) β no patient/report persistence | |
| - β No production search engine β using FAISS (in-memory, single-file, no filtering) | |
| - β No chunking strategy β basic RecursiveCharacterTextSplitter only | |
| - β No hybrid search (BM25 + vector) β vector-only retrieval | |
| - β No production embeddings β using local HuggingFace MiniLM (384d) or Google free tier | |
| - β No data ingestion pipeline (Airflow) β manual PDF loading | |
| - β No caching layer (Redis) β every query hits LLM | |
| - β No observability (Langfuse) β no tracing, no cost tracking | |
| - β No streaming responses β synchronous only | |
| - β No Gradio interface β CLI only (besides basic API) | |
| - β No messaging bot (Telegram/WhatsApp) β no mobile access | |
| - β No agentic RAG with guardrails, document grading, query rewriting | |
| - β No proper dependency injection pattern (FastAPI `Depends()`) | |
| - β No Pydantic Settings with env-nested config | |
| - β No factory pattern for service initialization | |
| - β No proper exception hierarchy | |
| - β No health checks for all services | |
| - β No Makefile / dev tooling (ruff, mypy, pre-commit) | |
| - β No proper test infrastructure (pytest fixtures, test containers) | |
| --- | |
| ## 2. Deep Review: Course vs. Your Codebase | |
| ### Course Architecture (What Production Looks Like) | |
| ``` | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Docker Compose Orchestration β | |
| ββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ¬ββββββββββββββββββ€ | |
| β FastAPI βPostgreSQLβOpenSearchβ Ollama β Airflow β | |
| β (8000) β (5432) β (9200) β (11434) β (8080) β | |
| ββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββΌββββββββββββββββββ€ | |
| β Redis β Langfuse βClickHouseβ MinIO β Langfuse-PG β | |
| β (6379) β (3001) β β β (5433) β | |
| ββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββ΄ββββββββββββββββββ€ | |
| β Gradio UI (7861) β Telegram Bot β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| **Key Patterns from Course:** | |
| - **Pydantic Settings** with `env_nested_delimiter="__"` for hierarchical config | |
| - **Factory pattern** (`make_*` functions) for every service | |
| - **Dependency injection** via FastAPI `Depends()` with typed annotations | |
| - **Lifespan context** for startup/shutdown with proper resource management | |
| - **Service layer separation**: `routers/` β `services/` β `clients/` | |
| - **Schema-driven**: Separate Pydantic schemas for API, database, embeddings, indexing | |
| - **Exception hierarchy**: Domain-specific exceptions (`PDFParsingException`, `OllamaException`, etc.) | |
| - **Context dataclass** for LangGraph runtime dependency injection | |
| - **Structured LLM output** via `.with_structured_output(PydanticModel)` | |
| ### Your Codebase Architecture (Current State) | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Basic FastAPI (api/app/) β | |
| β Single Dockerfile, no orchestration β | |
| βββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β src/ (Core Domain Logic) β | |
| β βββββββββββββββββββββββββββββββββββββββ β | |
| β β workflow.py (LangGraph StateGraph) β β | |
| β β 6 agents/ (parallel execution) β β | |
| β β biomarker_validator.py (24 markers) β β | |
| β β pdf_processor.py (FAISS + PyPDF) β β | |
| β β evaluation/ (5D framework) β β | |
| β β evolution/ (SOP optimization) β β | |
| β βββββββββββββββββββββββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β FAISS vector store (single file) β | |
| β No PostgreSQL, No Redis, No OpenSearch β | |
| βββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| --- | |
| ## 3. Architecture Gap Analysis | |
| | Dimension | Course (Production) | Your Codebase (Prototype) | Gap Severity | | |
| |-----------|-------------------|--------------------------|--------------| | |
| | **Container Orchestration** | Docker Compose with 12+ services, health checks, networks | Single Dockerfile, manual startup | π΄ Critical | | |
| | **Database** | PostgreSQL 16 with SQLAlchemy models, repositories | None (in-memory only) | π΄ Critical | | |
| | **Search Engine** | OpenSearch 2.19 with BM25 + KNN hybrid, RRF fusion | FAISS (vector-only, no filtering) | π΄ Critical | | |
| | **Chunking** | Section-aware chunking (600w, 100w overlap, metadata) | Basic RecursiveCharacterTextSplitter (1000 char) | π‘ Major | | |
| | **Embeddings** | Jina AI v3 (1024d, passage/query differentiation) | HuggingFace MiniLM (384d) or Google free tier | π‘ Major | | |
| | **Data Pipeline** | Airflow DAGs (daily schedule, fetchβparseβchunkβindex) | Manual PDF loading, one-time setup | π‘ Major | | |
| | **Caching** | Redis with TTL, exact-match, SHA256 keys | None | π‘ Major | | |
| | **Observability** | Langfuse v3 (traces, spans, generations, cost tracking) | None (print statements only) | π‘ Major | | |
| | **Streaming** | SSE streaming with Gradio UI | None (synchronous responses) | π‘ Major | | |
| | **Agentic RAG** | LangGraph with guardrails, grading, rewriting, context_schema | Basic LangGraph (no guardrails, no grading) | π‘ Major | | |
| | **Bot Integration** | Telegram bot with /search, Q&A, caching | None | π’ Enhancement | | |
| | **Config Management** | Pydantic Settings, hierarchical env vars, frozen models | Basic os.getenv, dotenv | π‘ Major | | |
| | **Dependency Injection** | FastAPI Depends() with typed annotations | Manual global singletons | π‘ Major | | |
| | **Error Handling** | Domain exception hierarchy, graceful fallbacks | Basic try/except with prints | π‘ Major | | |
| | **Code Quality** | Ruff, MyPy, pre-commit, pytest with fixtures | Minimal pytest, no linting | π’ Enhancement | | |
| | **API Design** | Versioned (/api/v1/), health checks for all services | Basic routes, minimal health check | π‘ Major | | |
| --- | |
| ## Phase 1: Infrastructure Foundation (Week 1 Equivalent) | |
| > **Goal**: Containerize everything, add PostgreSQL for persistence, set up OpenSearch, establish professional development environment. | |
| ### 1.1 Docker Compose Orchestration | |
| Create a production `docker-compose.yml` with all services: | |
| ```yaml | |
| # Target services for MediGuard AI: | |
| services: | |
| api: # FastAPI application (port 8000) | |
| postgres: # Patient reports, analysis history (port 5432) | |
| opensearch: # Medical document search engine (port 9200) | |
| opensearch-dashboards: # Search UI (port 5601) | |
| redis: # Response caching (port 6379) | |
| ollama: # Local LLM for privacy-sensitive medical data (port 11434) | |
| airflow: # Medical literature pipeline (port 8080) | |
| langfuse-web: # Observability dashboard (port 3001) | |
| langfuse-worker/postgres/redis/clickhouse/minio: # Langfuse infra | |
| ``` | |
| **Tasks:** | |
| - [ ] Create root `docker-compose.yml` adapting course pattern to MedTech services | |
| - [ ] Create multi-stage `Dockerfile` using UV package manager (copy course pattern) | |
| - [ ] Add health checks for every service (PostgreSQL, OpenSearch, Redis, Ollama) | |
| - [ ] Set up Docker network `mediguard-network` with proper service dependencies | |
| - [ ] Configure volume persistence for all data stores | |
| - [ ] Create `.env.example` with all configuration variables documented | |
| ### 1.2 Pydantic Settings Configuration | |
| Replace scattered `os.getenv()` calls with hierarchical Pydantic Settings: | |
| ```python | |
| # New: src/config.py (course-inspired) | |
| class MedicalPDFSettings(BaseConfigSettings): # PDF parser config | |
| class ChunkingSettings(BaseConfigSettings): # Chunking parameters | |
| class OpenSearchSettings(BaseConfigSettings): # Search engine config | |
| class LangfuseSettings(BaseConfigSettings): # Observability config | |
| class RedisSettings(BaseConfigSettings): # Cache config | |
| class TelegramSettings(BaseConfigSettings): # Bot config | |
| class BiomarkerSettings(BaseConfigSettings): # Biomarker thresholds | |
| class Settings(BaseConfigSettings): # Root settings | |
| ``` | |
| **Tasks:** | |
| - [ ] Rewrite `src/config.py` β keep `ExplanationSOP` but add infrastructure settings classes | |
| - [ ] Use `env_nested_delimiter="__"` for hierarchical environment variables | |
| - [ ] Add `frozen=True` for immutable configuration | |
| - [ ] Move all hardcoded values to environment variables with sensible defaults | |
| - [ ] Create `get_settings()` factory with `@lru_cache` | |
| ### 1.3 PostgreSQL Database Setup | |
| Add persistent storage for analysis history β critical for medical audit trail: | |
| ```python | |
| # New models: | |
| class PatientAnalysis(Base): # Store each analysis run | |
| class AnalysisReport(Base): # Store final reports | |
| class MedicalDocument(Base): # Track ingested medical PDFs | |
| class BiomarkerReference(Base): # Biomarker reference ranges (currently JSON file) | |
| ``` | |
| **Tasks:** | |
| - [ ] Create `src/db/` package mirroring course pattern (factory, interfaces, postgresql) | |
| - [ ] Define SQLAlchemy models for analysis history and medical documents | |
| - [ ] Create repository pattern for data access | |
| - [ ] Set up Alembic for database migrations | |
| - [ ] Migrate `biomarker_references.json` to database (keep JSON as seed data) | |
| ### 1.4 Project Structure Refactor | |
| Reorganize to match production patterns: | |
| ``` | |
| src/ | |
| βββ config.py # Pydantic Settings (hierarchical) | |
| βββ main.py # FastAPI app with lifespan | |
| βββ database.py # Database utilities | |
| βββ dependencies.py # FastAPI dependency injection | |
| βββ exceptions.py # Domain exception hierarchy | |
| βββ middlewares.py # Request logging, timing | |
| βββ db/ # Database layer | |
| β βββ factory.py | |
| β βββ interfaces/ | |
| βββ models/ # SQLAlchemy models | |
| β βββ analysis.py | |
| β βββ document.py | |
| βββ repositories/ # Data access | |
| β βββ analysis.py | |
| β βββ document.py | |
| βββ routers/ # API endpoints | |
| β βββ analyze.py # Biomarker analysis | |
| β βββ ask.py # RAG Q&A (streaming + standard) | |
| β βββ health.py # Comprehensive health checks | |
| β βββ search.py # Medical document search | |
| βββ schemas/ # Pydantic request/response models | |
| β βββ api/ | |
| β βββ medical/ | |
| β βββ embeddings/ | |
| βββ services/ # Business logic | |
| β βββ agents/ # Your 6 medical agents (KEEP!) | |
| β β βββ biomarker_analyzer.py | |
| β β βββ disease_explainer.py | |
| β β βββ biomarker_linker.py | |
| β β βββ clinical_guidelines.py | |
| β β βββ confidence_assessor.py | |
| β β βββ response_synthesizer.py | |
| β β βββ agentic_rag.py # NEW: LangGraph agentic wrapper | |
| β β βββ nodes/ # NEW: Guardrail, grading, rewriting | |
| β β βββ state.py # Enhanced state | |
| β β βββ context.py # Runtime dependency injection | |
| β β βββ prompts.py # Medical-domain prompts | |
| β βββ opensearch/ # NEW: Search engine client | |
| β βββ embeddings/ # NEW: Production embeddings | |
| β βββ cache/ # NEW: Redis caching | |
| β βββ langfuse/ # NEW: Observability | |
| β βββ ollama/ # NEW: Local LLM client | |
| β βββ indexing/ # NEW: Chunking + indexing | |
| β βββ pdf_parser/ # Enhanced: Use Docling | |
| β βββ telegram/ # NEW: Bot integration | |
| β βββ biomarker/ # Extracted: validation + normalization | |
| βββ evaluation/ # KEEP: 5D evaluation | |
| βββ evolution/ # KEEP: SOP evolution | |
| ``` | |
| **Tasks:** | |
| - [ ] Create the new directory structure | |
| - [ ] Move API from `api/app/` into `src/` (single application) | |
| - [ ] Create `exceptions.py` with medical-domain exception hierarchy | |
| - [ ] Create `dependencies.py` with typed FastAPI dependency injection | |
| - [ ] Create `main.py` with proper lifespan context manager | |
| ### 1.5 Development Tooling | |
| **Tasks:** | |
| - [ ] Create `pyproject.toml` replacing `requirements.txt` (use UV) | |
| - [ ] Create `Makefile` with start/stop/test/lint/format/health commands | |
| - [ ] Add `ruff` for linting and formatting | |
| - [ ] Add `mypy` for type checking | |
| - [ ] Add `.pre-commit-config.yaml` | |
| - [ ] Create `.env.example` and `.env.test` | |
| --- | |
| ## Phase 2: Medical Data Ingestion Pipeline (Week 2 Equivalent) | |
| > **Goal**: Automated ingestion of medical PDFs, clinical guidelines, and reference documents with Airflow orchestration. | |
| ### 2.1 Medical PDF Parser Upgrade | |
| Replace basic PyPDF with Docling for better medical document handling: | |
| **Tasks:** | |
| - [ ] Create `src/services/pdf_parser/` with Docling integration (copy course pattern) | |
| - [ ] Add medical-specific section detection (Abstract, Methods, Results, Discussion, Clinical Guidelines) | |
| - [ ] Add table extraction for lab reference ranges | |
| - [ ] Add validation: file size limits, page limits, PDF header check | |
| - [ ] Add metadata extraction: title, authors, publication date, journal | |
| ### 2.2 Medical Document Sources | |
| Unlike arXiv (single API), medical literature comes from multiple sources: | |
| **Tasks:** | |
| - [ ] Create `src/services/medical_sources/` package | |
| - [ ] Implement PubMed API client (free, rate-limited) for research papers | |
| - [ ] Implement local PDF upload endpoint for clinical guidelines | |
| - [ ] Implement reference document ingestion (WHO, CDC, ADA guidelines) | |
| - [ ] Create document deduplication logic (by title hash + content fingerprint) | |
| - [ ] Add `MedicalDocument` model tracking: source, parse status, indexing status | |
| ### 2.3 Airflow Pipeline for Medical Literature | |
| **Tasks:** | |
| - [ ] Create `airflow/` directory with Dockerfile and entrypoint | |
| - [ ] Create `airflow/dags/medical_ingestion.py` DAG: | |
| - `setup_environment` β `fetch_new_documents` β `parse_pdfs` β `chunk_and_index` β `generate_report` | |
| - [ ] Schedule: Daily at 6 AM for PubMed updates, on-demand for uploaded PDFs | |
| - [ ] Add retry logic with exponential backoff | |
| - [ ] Mount `src/` into Airflow container for shared code | |
| ### 2.4 PostgreSQL Storage for Documents | |
| **Tasks:** | |
| - [ ] Create `MedicalDocument` model: id, title, source, source_type, authors, abstract, raw_text, sections, parse_status, indexed_at | |
| - [ ] Create `PaperRepository` with CRUD + upsert + status tracking | |
| - [ ] Track processing pipeline: `uploaded β parsed β chunked β indexed` | |
| - [ ] Store parsed sections as JSON for re-indexing without re-parsing | |
| --- | |
| ## Phase 3: Production Search Foundation (Week 3 Equivalent) | |
| > **Goal**: Replace FAISS with OpenSearch for production BM25 keyword search with medical-specific optimizations. | |
| ### 3.1 OpenSearch Client | |
| **Tasks:** | |
| - [ ] Create `src/services/opensearch/` package (adapt course pattern) | |
| - [ ] Implement `OpenSearchClient` with: | |
| - Health check, index management, BM25 search, bulk indexing | |
| - **Medical-specific**: Boost clinical term matches, support ICD-10 code filtering | |
| - [ ] Create `QueryBuilder` with medical field boosting: | |
| ``` | |
| fields: ["chunk_text^3", "title^2", "section_title^1.5", "abstract^1"] | |
| ``` | |
| - [ ] Create `index_config_hybrid.py` with medical document mapping: | |
| - Fields: chunk_text, title, authors, abstract, document_type (guideline/research/reference), condition_tags, publication_year | |
| ### 3.2 Medical Document Index Mapping | |
| ```python | |
| MEDICAL_CHUNKS_MAPPING = { | |
| "settings": { | |
| "index.knn": True, | |
| "analysis": { | |
| "analyzer": { | |
| "medical_analyzer": { | |
| "type": "custom", | |
| "tokenizer": "standard", | |
| "filter": ["lowercase", "medical_synonyms", "stop", "snowball"] | |
| } | |
| } | |
| } | |
| }, | |
| "mappings": { | |
| "properties": { | |
| "chunk_text": {"type": "text", "analyzer": "medical_analyzer"}, | |
| "document_type": {"type": "keyword"}, # guideline, research, reference | |
| "condition_tags": {"type": "keyword"}, # diabetes, anemia, etc. | |
| "biomarkers_mentioned": {"type": "keyword"}, # Glucose, HbA1c, etc. | |
| "embedding": {"type": "knn_vector", "dimension": 1024}, | |
| # ... more fields | |
| } | |
| } | |
| } | |
| ``` | |
| **Tasks:** | |
| - [ ] Design medical-optimized OpenSearch mapping | |
| - [ ] Add medical synonym analyzer (e.g., "diabetes mellitus" β "DM", "HbA1c" β "glycated hemoglobin") | |
| - [ ] Create search endpoint `POST /api/v1/search` with filtering by document_type, condition_tags | |
| - [ ] Implement BM25 search with medical field boosting | |
| - [ ] Create index verification in startup lifespan | |
| --- | |
| ## Phase 4: Hybrid Search & Intelligent Chunking (Week 4 Equivalent) | |
| > **Goal**: Section-aware chunking for medical documents + hybrid search (BM25 + semantic) with RRF fusion. | |
| ### 4.1 Medical-Aware Text Chunking | |
| **Tasks:** | |
| - [ ] Create `src/services/indexing/text_chunker.py` adapting course's `TextChunker`: | |
| - Section-aware chunking (detect: Introduction, Methods, Results, Discussion, Guidelines, References) | |
| - Target: 600 words per chunk, 100 word overlap | |
| - Medical metadata: section_title, biomarkers_mentioned, condition_tags | |
| - [ ] Create `MedicalTextChunker` subclass with: | |
| - Biomarker mention detection (scan for any of 24+ biomarker names) | |
| - Condition tag extraction (diabetes, anemia, heart disease, etc.) | |
| - Table-aware chunking (keep tables together) | |
| - Reference section filtering (skip bibliography chunks) | |
| - [ ] Create `HybridIndexingService` for chunk β embed β index pipeline | |
| ### 4.2 Production Embeddings | |
| **Tasks:** | |
| - [ ] Create `src/services/embeddings/` with Jina AI client (1024d, passage/query differentiation) | |
| - [ ] Add fallback chain: Jina β Google β HuggingFace | |
| - [ ] Implement batch embedding for efficient indexing | |
| - [ ] Track embedding model in chunk metadata for versioning | |
| ### 4.3 Hybrid Search with RRF | |
| **Tasks:** | |
| - [ ] Implement `search_unified()` supporting: BM25-only, vector-only, hybrid modes | |
| - [ ] Set up OpenSearch RRF (Reciprocal Rank Fusion) pipeline | |
| - [ ] Create unified search endpoint `POST /api/v1/hybrid-search/` | |
| - [ ] Add min_score filtering and result deduplication | |
| - [ ] Benchmark: BM25 vs. vector vs. hybrid on medical queries | |
| --- | |
| ## Phase 5: Complete RAG Pipeline with Streaming (Week 5 Equivalent) | |
| > **Goal**: Replace synchronous analysis with streaming RAG, add Gradio UI, optimize prompts. | |
| ### 5.1 Ollama Client Upgrade | |
| **Tasks:** | |
| - [ ] Create `src/services/ollama/` package (adapt course pattern) | |
| - [ ] Implement `OllamaClient` with: | |
| - Health check, model listing, generate, streaming generate | |
| - Usage metadata extraction (tokens, latency) | |
| - LangChain integration: `get_langchain_model()` for structured output | |
| - [ ] Create medical-specific RAG prompt templates: | |
| - `rag_medical_system.txt` β optimized for medical explanation generation | |
| - Structured output format for clinical responses | |
| - [ ] Create `OllamaFactory` with `@lru_cache` | |
| ### 5.2 Streaming RAG Endpoints | |
| **Tasks:** | |
| - [ ] Create `POST /api/v1/ask` β standard RAG with medical context retrieval | |
| - [ ] Create `POST /api/v1/stream` β SSE streaming for real-time responses | |
| - [ ] Create `POST /api/v1/analyze/stream` β streaming biomarker analysis | |
| - [ ] Integrate with existing multi-agent pipeline: | |
| ``` | |
| Query β Hybrid Search β Medical Chunks β Agent Pipeline β Streaming Response | |
| ``` | |
| ### 5.3 Gradio Medical Interface | |
| **Tasks:** | |
| - [ ] Create `src/gradio_app.py` for interactive medical RAG: | |
| - Biomarker input form (structured entry) | |
| - Natural language input (free text) | |
| - Streaming response display | |
| - Search mode selector (BM25, hybrid, vector) | |
| - Model selector | |
| - Analysis history display | |
| - [ ] Create `gradio_launcher.py` for easy startup | |
| - [ ] Expose on port 7861 | |
| ### 5.4 Prompt Optimization | |
| **Tasks:** | |
| - [ ] Reduce prompt size by 60-80% (course achieved 80% reduction) | |
| - [ ] Create focused medical prompts (separate: biomarker analysis, disease explanation, guidelines) | |
| - [ ] Test prompt variants using 5D evaluation framework | |
| - [ ] Store best prompts as SOP parameters (tie into evolution engine) | |
| --- | |
| ## Phase 6: Monitoring, Caching & Observability (Week 6 Equivalent) | |
| > **Goal**: Add Langfuse tracing for the entire pipeline, Redis caching, and production monitoring. | |
| ### 6.1 Langfuse Integration | |
| **Tasks:** | |
| - [ ] Create `src/services/langfuse/` package (adapt course pattern): | |
| - `client.py` β LangfuseTracer wrapper with v3 SDK | |
| - `factory.py` β cached tracer factory | |
| - `tracer.py` β medical-specific RAGTracer with named steps | |
| - [ ] Add spans for every pipeline step: | |
| - `biomarker_validation` β `query_embedding` β `search_retrieval` β `agent_execution` β `response_synthesis` | |
| - [ ] Track per-request metrics: | |
| - Total latency, LLM tokens used, search results count, cache hit/miss, agent execution time | |
| - [ ] Add Langfuse Docker services to docker-compose.yml | |
| - [ ] Create trace visualization for medical analysis pipeline | |
| ### 6.2 Redis Caching | |
| **Tasks:** | |
| - [ ] Create `src/services/cache/` package (adapt course pattern): | |
| - Exact-match cache: SHA256(query + model + top_k + biomarkers) β cached response | |
| - TTL: 6 hours for general queries, 1 hour for biomarker analysis (values may change) | |
| - [ ] Add caching to: | |
| - `/api/v1/ask` β cache RAG responses | |
| - `/api/v1/analyze` β cache full analysis results | |
| - Embeddings β cache frequently queried embeddings | |
| - [ ] Add graceful fallback: cache miss β normal pipeline | |
| - [ ] Track cache hit rates in Langfuse | |
| ### 6.3 Production Health Dashboard | |
| **Tasks:** | |
| - [ ] Enhance `/api/v1/health` to check all services: | |
| - PostgreSQL, OpenSearch, Redis, Ollama, Langfuse, Airflow | |
| - [ ] Add `/api/v1/metrics` endpoint for operational metrics | |
| - [ ] Create Langfuse dashboard for: | |
| - Average response time, cache hit rate, error rate, token costs | |
| - Per-agent execution times, search relevance scores | |
| --- | |
| ## Phase 7: Agentic RAG & Messaging Bot (Week 7 Equivalent) | |
| > **Goal**: Wrap your multi-agent pipeline in a LangGraph agentic workflow with guardrails, document grading, and query rewriting. Add Telegram bot for mobile access. | |
| ### 7.1 Agentic RAG Wrapper | |
| This is the most impactful upgrade β it adds **intelligence around your existing agents**: | |
| ``` | |
| User Query | |
| β | |
| [GUARDRAIL] ββββ Is this a medical/biomarker question? βββββ [OUT OF SCOPE] | |
| β yes | |
| [RETRIEVE] ββββ Hybrid search for medical documents βββββ [TOOL: search] | |
| β | |
| [GRADE DOCUMENTS] ββββ Are results relevant? βββββ [REWRITE QUERY] βββ loop | |
| β yes | |
| [CLINICAL ANALYSIS] ββββ Your 6 medical agents βββββ structured analysis | |
| β | |
| [GENERATE RESPONSE] ββββ Synthesize with citations βββββ final answer | |
| ``` | |
| **Tasks:** | |
| - [ ] Create `src/services/agents/agentic_rag.py` β `AgenticRAGService` class | |
| - [ ] Create `src/services/agents/nodes/`: | |
| - `guardrail_node.py` β Medical domain validation (score 0-100) | |
| - In-scope: biomarker questions, disease queries, clinical guidelines | |
| - Out-of-scope: non-medical, general knowledge, harmful content | |
| - `retrieve_node.py` β Creates tool call with `max_retrieval_attempts` | |
| - `grade_documents_node.py` β LLM evaluates medical relevance | |
| - `rewrite_query_node.py` β LLM rewrites for better medical retrieval | |
| - `generate_answer_node.py` β Uses your existing agent pipeline OR direct LLM | |
| - `out_of_scope_node.py` β Polite medical-domain rejection | |
| - [ ] Create `src/services/agents/state.py` β Enhanced state with guardrail_result, routing_decision, grading_results | |
| - [ ] Create `src/services/agents/context.py` β Runtime context for dependency injection | |
| - [ ] Create `src/services/agents/prompts.py` β Medical-specific prompts: | |
| - Guardrail: "Is this about health/biomarkers/medical conditions?" | |
| - Grading: "Does this medical document answer the clinical question?" | |
| - Rewriting: "Improve this medical query for better document retrieval" | |
| - Generation: "Synthesize medical findings with citations and safety caveats" | |
| - [ ] Create `src/services/agents/tools.py` β Medical retriever tool wrapping OpenSearch | |
| - [ ] Create `POST /api/v1/ask-agentic` endpoint | |
| - [ ] Add Langfuse tracing to every node | |
| ### 7.2 Medical Guardrails (Critical for MedTech) | |
| Beyond the course's simple domain check, add medical-specific safety: | |
| **Tasks:** | |
| - [ ] **Input guardrails**: | |
| - Detect harmful queries (self-harm, drug abuse guidance) | |
| - Detect attempts to get diagnosis without proper data | |
| - Validate biomarker values are physiologically plausible | |
| - [ ] **Output guardrails**: | |
| - Always include "consult your healthcare provider" disclaimer | |
| - Never provide definitive diagnosis (always "suggests" / "may indicate") | |
| - Flag critical biomarker values with immediate action advice | |
| - Ensure safety_alerts are present for out-of-range values | |
| - [ ] **Citation guardrails**: | |
| - Ensure all medical claims have document citations | |
| - Flag unsupported claims | |
| ### 7.3 Telegram Bot Integration | |
| **Tasks:** | |
| - [ ] Create `src/services/telegram/` package (adapt course pattern) | |
| - [ ] Implement bot commands: | |
| - `/start` β Welcome with medical assistant introduction | |
| - `/help` β Show capabilities and input format | |
| - `/analyze <biomarker values>` β Quick biomarker analysis | |
| - `/search <medical query>` β Search medical documents | |
| - `/report` β Get last analysis as formatted report | |
| - Free text β Full RAG Q&A about medical topics | |
| - [ ] Add typing indicators and progress messages | |
| - [ ] Integrate caching for repeated queries | |
| - [ ] Add rate limiting (medical queries shouldn't be spammed) | |
| - [ ] Create `TelegramFactory` gated by `TELEGRAM__ENABLED=true` | |
| ### 7.4 Feedback Loop | |
| **Tasks:** | |
| - [ ] Create `POST /api/v1/feedback` endpoint (adapt from course) | |
| - [ ] Integrate with Langfuse scoring | |
| - [ ] Use feedback data to identify weak prompts β feed into SOP evolution engine | |
| --- | |
| ## Phase 8: MedTech-Specific Additions (Beyond Course) | |
| > **Goal**: Things the course doesn't cover but your medical domain demands. | |
| ### 8.1 HIPAA-Awareness Patterns | |
| **Tasks:** | |
| - [ ] Never log patient biomarker values in plain text | |
| - [ ] Add request ID tracking without PII | |
| - [ ] Create data retention policy (auto-delete analysis data after configurable period) | |
| - [ ] Add audit logging for all analysis requests | |
| - [ ] Document HIPAA compliance approach (even if not yet certified) | |
| ### 8.2 Medical Safety Testing | |
| **Tasks:** | |
| - [ ] Create medical-specific test suite: | |
| - Critical value detection tests (every critical biomarker) | |
| - Guardrail rejection tests (non-medical queries) | |
| - Citation completeness tests | |
| - Safety disclaimer presence tests | |
| - Biomarker normalization tests (already have some) | |
| - [ ] Integrate 5D evaluation into CI pipeline | |
| - [ ] Create test fixtures with realistic medical scenarios | |
| ### 8.3 Evolution Engine Integration | |
| **Tasks:** | |
| - [ ] Wire SOP evolution engine to production metrics (Langfuse data) | |
| - [ ] Create Airflow DAG for scheduled evolution cycles | |
| - [ ] Store evolved SOPs in PostgreSQL with version tracking | |
| - [ ] A/B test SOP variants using Langfuse trace comparison | |
| ### 8.4 Multi-condition Support | |
| **Tasks:** | |
| - [ ] Extend condition coverage beyond current 5 diseases | |
| - [ ] Add condition-specific retrieval strategies | |
| - [ ] Create condition-specific chunking filters | |
| - [ ] Support multi-condition analysis (comorbidities) | |
| --- | |
| ## Implementation Priority Matrix | |
| | Priority | Phase | Effort | Impact | Dependencies | | |
| |----------|-------|--------|--------|--------------| | |
| | π΄ P0 | 1.1 Docker Compose | 2 days | Critical | None | | |
| | π΄ P0 | 1.2 Pydantic Settings | 1 day | Critical | None | | |
| | π΄ P0 | 1.4 Project Restructure | 2 days | Critical | None | | |
| | π΄ P0 | 1.5 Dev Tooling | 0.5 day | Critical | 1.4 | | |
| | π΄ P0 | 1.3 PostgreSQL + Models | 2 days | Critical | 1.1, 1.4 | | |
| | π‘ P1 | 3.1 OpenSearch Client | 2 days | High | 1.1, 1.4 | | |
| | π‘ P1 | 3.2 Medical Index Mapping | 1 day | High | 3.1 | | |
| | π‘ P1 | 4.1 Medical Text Chunker | 2 days | High | 3.1 | | |
| | π‘ P1 | 4.2 Production Embeddings | 1 day | High | 4.1 | | |
| | π‘ P1 | 4.3 Hybrid Search + RRF | 1 day | High | 3.1, 4.2 | | |
| | π‘ P1 | 5.1 Ollama Client | 1 day | High | 1.4 | | |
| | π‘ P1 | 5.2 Streaming Endpoints | 1 day | High | 5.1, 4.3 | | |
| | π‘ P1 | 2.1 PDF Parser (Docling) | 1 day | High | 1.4 | | |
| | π‘ P1 | 7.1 Agentic RAG Wrapper | 3 days | High | 5.2, 4.3 | | |
| | π‘ P1 | 7.2 Medical Guardrails | 2 days | High | 7.1 | | |
| | π’ P2 | 2.3 Airflow Pipeline | 2 days | Medium | 1.1, 2.1, 4.1 | | |
| | π’ P2 | 5.3 Gradio Interface | 1 day | Medium | 5.2 | | |
| | π’ P2 | 6.1 Langfuse Tracing | 2 days | Medium | 1.1, 5.2 | | |
| | π’ P2 | 6.2 Redis Caching | 1 day | Medium | 1.1, 5.2 | | |
| | π’ P2 | 6.3 Health Dashboard | 0.5 day | Medium | 6.1 | | |
| | π’ P2 | 7.3 Telegram Bot | 2 days | Medium | 7.1, 6.2 | | |
| | π’ P2 | 7.4 Feedback Loop | 0.5 day | Medium | 6.1 | | |
| | π΅ P3 | 2.2 Medical Sources | 2 days | Low | 2.1 | | |
| | π΅ P3 | 8.1 HIPAA Patterns | 1 day | Low | 1.3 | | |
| | π΅ P3 | 8.2 Safety Testing | 2 days | Low | 7.2 | | |
| | π΅ P3 | 8.3 Evolution Integration | 2 days | Low | 6.1, 2.3 | | |
| | π΅ P3 | 8.4 Multi-condition | 3 days | Low | 4.1 | | |
| **Estimated Total: ~40 days of focused work** | |
| --- | |
| ## Migration Strategy | |
| ### Step 1: Foundation (Week 1-2 of work) | |
| 1. Restructure project layout β Phase 1.4 | |
| 2. Create Pydantic Settings β Phase 1.2 | |
| 3. Set up Docker Compose β Phase 1.1 | |
| 4. Add PostgreSQL with models β Phase 1.3 | |
| 5. Add dev tooling β Phase 1.5 | |
| ### Step 2: Search Engine (Week 2-3) | |
| 6. Create OpenSearch client + medical mapping β Phase 3.1, 3.2 | |
| 7. Build medical text chunker β Phase 4.1 | |
| 8. Add production embeddings (Jina) β Phase 4.2 | |
| 9. Implement hybrid search + RRF β Phase 4.3 | |
| 10. Upgrade PDF parser to Docling β Phase 2.1 | |
| ### Step 3: RAG Pipeline (Week 3-4) | |
| 11. Create Ollama client β Phase 5.1 | |
| 12. Add streaming endpoints β Phase 5.2 | |
| 13. Build agentic RAG wrapper β Phase 7.1 | |
| 14. Add medical guardrails β Phase 7.2 | |
| 15. Create Gradio interface β Phase 5.3 | |
| ### Step 4: Production Hardening (Week 4-5) | |
| 16. Add Langfuse observability β Phase 6.1 | |
| 17. Add Redis caching β Phase 6.2 | |
| 18. Set up Airflow pipeline β Phase 2.3 | |
| 19. Build Telegram bot β Phase 7.3 | |
| 20. Add feedback loop β Phase 7.4 | |
| ### Step 5: Polish (Week 5-6) | |
| 21. Health dashboard β Phase 6.3 | |
| 22. Medical safety testing β Phase 8.2 | |
| 23. HIPAA patterns β Phase 8.1 | |
| 24. Evolution engine integration β Phase 8.3 | |
| ### Key Migration Rules | |
| - **Never break what works**: Keep all existing agents functional throughout | |
| - **Test at every step**: Run existing tests after each phase | |
| - **Incremental Docker**: Start with API + PostgreSQL, add services one at a time | |
| - **Feature flags**: Gate new features (Telegram, Langfuse, Redis) behind settings | |
| - **Backward compatibility**: Keep CLI chatbot working alongside new API | |
| --- | |
| ## Architecture Target State | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Docker Compose Orchestration β | |
| β β | |
| β ββββββββββββ βββββββββββββ βββββββββββββ ββββββββββ βββββββββββ β | |
| β β FastAPI β βPostgreSQL β β OpenSearch β β Ollama β β Airflow β β | |
| β β + Gradio β β (reports, β β (hybrid β β (local β β (daily β β | |
| β β (8000, β β docs, β β medical β β LLM) β β ingest) β β | |
| β β 7861) β β history) β β search) β β β β β β | |
| β ββββββ¬ββββββ βββββββ¬ββββββ βββββββ¬ββββββ βββββ¬βββββ ββββββ¬βββββ β | |
| β β β β β β β | |
| β ββββββ΄ββββββ βββββββ΄ββββββ ββββββ΄βββββββββββββ΄βββββββββββββ΄βββ β | |
| β β Redis β β Langfuse β β mediguard-network β β | |
| β β (cache) β β (observe) β ββββββββββββββββββββββββββββββββββββ β | |
| β ββββββββββββ βββββββββββββ β | |
| β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β Agentic RAG Pipeline β β | |
| β β β β | |
| β β Query β [Guardrail] β [Retrieve] β [Grade] β [6 Medical Agents] β β | |
| β β β β β β β β | |
| β β [Out of Scope] [Rewrite] [Generate] β Final Response β β | |
| β β β β | |
| β β Agents: Biomarker Analyzer β Disease Explainer β Linker β β | |
| β β Clinical Guidelines β Confidence β Synthesizer β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β | |
| β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββββββ β | |
| β β Telegram Bot β β Gradio UI β β 5D Eval + SOP Evolution β β | |
| β β (mobile) β β (desktop) β β (self-improvement loop) β β | |
| β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| --- | |
| ## Files to Create (Summary) | |
| | New File | Source of Inspiration | | |
| |----------|----------------------| | |
| | `docker-compose.yml` | Course `compose.yml` (adapted) | | |
| | `Dockerfile` | Course `Dockerfile` (multi-stage UV) | | |
| | `Makefile` | Course `Makefile` | | |
| | `pyproject.toml` | Course `pyproject.toml` | | |
| | `.pre-commit-config.yaml` | Course `.pre-commit-config.yaml` | | |
| | `.env.example` | Course `.env.example` | | |
| | `src/main.py` | Course `src/main.py` (lifespan pattern) | | |
| | `src/config.py` | Course `src/config.py` + existing SOP config | | |
| | `src/dependencies.py` | Course `src/dependencies.py` | | |
| | `src/exceptions.py` | Course `src/exceptions.py` (medical exceptions) | | |
| | `src/database.py` | Course `src/database.py` | | |
| | `src/db/*` | Course `src/db/*` | | |
| | `src/models/analysis.py` | New (medical domain) | | |
| | `src/models/document.py` | Course `src/models/paper.py` (adapted) | | |
| | `src/repositories/*` | Course `src/repositories/*` (adapted) | | |
| | `src/routers/ask.py` | Course `src/routers/ask.py` | | |
| | `src/routers/search.py` | Course `src/routers/hybrid_search.py` | | |
| | `src/routers/health.py` | Course `src/routers/ping.py` (enhanced) | | |
| | `src/schemas/*` | Course `src/schemas/*` (medical schemas) | | |
| | `src/services/opensearch/*` | Course `src/services/opensearch/*` | | |
| | `src/services/embeddings/*` | Course `src/services/embeddings/*` | | |
| | `src/services/ollama/*` | Course `src/services/ollama/*` | | |
| | `src/services/cache/*` | Course `src/services/cache/*` | | |
| | `src/services/langfuse/*` | Course `src/services/langfuse/*` | | |
| | `src/services/indexing/*` | Course `src/services/indexing/*` (medical chunks) | | |
| | `src/services/pdf_parser/*` | Course `src/services/pdf_parser/*` | | |
| | `src/services/telegram/*` | Course `src/services/telegram/*` | | |
| | `src/services/agents/agentic_rag.py` | Course (adapted for medical agents) | | |
| | `src/services/agents/nodes/*` | Course (medical guardrails) | | |
| | `src/services/agents/context.py` | Course | | |
| | `src/services/agents/prompts.py` | Course (medical prompts) | | |
| | `src/gradio_app.py` | Course `src/gradio_app.py` (medical UI) | | |
| | `airflow/dags/medical_ingestion.py` | Course `airflow/dags/arxiv_paper_ingestion.py` | | |
| ## Files to Keep & Enhance | |
| | Existing File | Action | | |
| |---------------|--------| | |
| | `src/agents/biomarker_analyzer.py` | Keep, move to `src/services/agents/medical/` | | |
| | `src/agents/disease_explainer.py` | Keep, move, add OpenSearch retriever | | |
| | `src/agents/biomarker_linker.py` | Keep, move, add OpenSearch retriever | | |
| | `src/agents/clinical_guidelines.py` | Keep, move, add OpenSearch retriever | | |
| | `src/agents/confidence_assessor.py` | Keep, move | | |
| | `src/agents/response_synthesizer.py` | Keep, move | | |
| | `src/biomarker_validator.py` | Keep, move to `src/services/biomarker/` | | |
| | `src/biomarker_normalization.py` | Keep, move to `src/services/biomarker/` | | |
| | `src/evaluation/` | Keep, enhance with Langfuse integration | | |
| | `src/evolution/` | Keep, wire to production metrics | | |
| | `config/biomarker_references.json` | Keep as seed data, migrate to DB | | |
| | `scripts/chat.py` | Keep, update imports | | |
| | `tests/*` | Keep, add production test fixtures | | |
| --- | |
| *This plan transforms MediGuard AI from a working prototype into a production-grade medical RAG system, applying every infrastructure lesson from the arXiv Paper Curator course while preserving and enhancing your unique medical domain logic.* | |