Agentic-RagBot / docs /plans /PRODUCTION_UPGRADE_PLAN.md
Nikhil Pravin Pise
Refactor: Improve code quality, security, and configuration
ad2e847

MediGuard AI β€” Production Upgrade Plan

From Prototype to Production-Grade MedTech RAG System

Generated: 2026-02-23
Based on: Deep review of production-agentic-rag-course (Weeks 1–7) + existing RagBot codebase
Goal: Take the existing MediGuard AI (clinical biomarker analysis + RAG explanation system) to full production quality, applying every lesson from the arXiv Paper Curator course β€” adapted for the MedTech domain.


Table of Contents

  1. Executive Summary
  2. Deep Review: Course vs. Your Codebase
  3. Architecture Gap Analysis
  4. Phase 1: Infrastructure Foundation
  5. Phase 2: Medical Data Ingestion Pipeline
  6. Phase 3: Production Search Foundation
  7. Phase 4: Hybrid Search & Intelligent Chunking
  8. Phase 5: Complete RAG Pipeline with Streaming
  9. Phase 6: Monitoring, Caching & Observability
  10. Phase 7: Agentic RAG & Messaging Bot
  11. Phase 8: MedTech-Specific Additions
  12. Implementation Priority Matrix
  13. Migration Strategy

1. Executive Summary

Your RagBot is a working prototype with strong domain logic (biomarker validation, multi-agent clinical analysis, 5D evaluation, SOP evolution). The course teaches production infrastructure (Docker orchestration, OpenSearch hybrid search, Airflow pipelines, Redis caching, Langfuse observability, LangGraph agentic workflows, Telegram bot).

The strategy: Keep your excellent medical domain logic and multi-agent architecture, but rebuild the infrastructure layer to match production standards. Your domain is harder than arXiv papers β€” medical data demands stricter validation, HIPAA-aware patterns, and safety guardrails.

What You Have (Strengths)

  • βœ… 6 specialized medical agents (Biomarker Analyzer, Disease Explainer, Biomarker-Disease Linker, Clinical Guidelines, Confidence Assessor, Response Synthesizer)
  • βœ… LangGraph orchestration with parallel execution
  • βœ… Robust biomarker validation with 24 biomarkers, reference ranges, critical values
  • βœ… 5D evaluation framework (Clinical Accuracy, Evidence Grounding, Actionability, Clarity, Safety)
  • βœ… SOP evolution engine (Outer Loop optimization)
  • βœ… Multi-provider LLM support (Groq, Gemini, Ollama)
  • βœ… Basic FastAPI with analysis endpoints
  • βœ… CLI chatbot with natural language biomarker extraction

What You're Missing (Gaps)

  • ❌ No Docker Compose orchestration (only minimal single-service Dockerfile)
  • ❌ No production database (PostgreSQL) β€” no patient/report persistence
  • ❌ No production search engine β€” using FAISS (in-memory, single-file, no filtering)
  • ❌ No chunking strategy β€” basic RecursiveCharacterTextSplitter only
  • ❌ No hybrid search (BM25 + vector) β€” vector-only retrieval
  • ❌ No production embeddings β€” using local HuggingFace MiniLM (384d) or Google free tier
  • ❌ No data ingestion pipeline (Airflow) β€” manual PDF loading
  • ❌ No caching layer (Redis) β€” every query hits LLM
  • ❌ No observability (Langfuse) β€” no tracing, no cost tracking
  • ❌ No streaming responses β€” synchronous only
  • ❌ No Gradio interface β€” CLI only (besides basic API)
  • ❌ No messaging bot (Telegram/WhatsApp) β€” no mobile access
  • ❌ No agentic RAG with guardrails, document grading, query rewriting
  • ❌ No proper dependency injection pattern (FastAPI Depends())
  • ❌ No Pydantic Settings with env-nested config
  • ❌ No factory pattern for service initialization
  • ❌ No proper exception hierarchy
  • ❌ No health checks for all services
  • ❌ No Makefile / dev tooling (ruff, mypy, pre-commit)
  • ❌ No proper test infrastructure (pytest fixtures, test containers)

2. Deep Review: Course vs. Your Codebase

Course Architecture (What Production Looks Like)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Docker Compose Orchestration                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ FastAPI  β”‚PostgreSQLβ”‚OpenSearchβ”‚  Ollama  β”‚   Airflow       β”‚
β”‚ (8000)   β”‚ (5432)   β”‚ (9200)   β”‚ (11434)  β”‚   (8080)        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Redis   β”‚ Langfuse β”‚ClickHouseβ”‚  MinIO   β”‚ Langfuse-PG     β”‚
β”‚ (6379)   β”‚ (3001)   β”‚          β”‚          β”‚ (5433)          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚            Gradio UI (7861) β”‚ Telegram Bot                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Patterns from Course:

  • Pydantic Settings with env_nested_delimiter="__" for hierarchical config
  • Factory pattern (make_* functions) for every service
  • Dependency injection via FastAPI Depends() with typed annotations
  • Lifespan context for startup/shutdown with proper resource management
  • Service layer separation: routers/ β†’ services/ β†’ clients/
  • Schema-driven: Separate Pydantic schemas for API, database, embeddings, indexing
  • Exception hierarchy: Domain-specific exceptions (PDFParsingException, OllamaException, etc.)
  • Context dataclass for LangGraph runtime dependency injection
  • Structured LLM output via .with_structured_output(PydanticModel)

Your Codebase Architecture (Current State)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           Basic FastAPI (api/app/)           β”‚
β”‚     Single Dockerfile, no orchestration      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚        src/ (Core Domain Logic)              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚ workflow.py (LangGraph StateGraph)   β”‚    β”‚
β”‚  β”‚ 6 agents/ (parallel execution)       β”‚    β”‚
β”‚  β”‚ biomarker_validator.py (24 markers)  β”‚    β”‚
β”‚  β”‚ pdf_processor.py (FAISS + PyPDF)     β”‚    β”‚
β”‚  β”‚ evaluation/ (5D framework)           β”‚    β”‚
β”‚  β”‚ evolution/ (SOP optimization)        β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   FAISS vector store (single file)           β”‚
β”‚   No PostgreSQL, No Redis, No OpenSearch     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3. Architecture Gap Analysis

Dimension Course (Production) Your Codebase (Prototype) Gap Severity
Container Orchestration Docker Compose with 12+ services, health checks, networks Single Dockerfile, manual startup πŸ”΄ Critical
Database PostgreSQL 16 with SQLAlchemy models, repositories None (in-memory only) πŸ”΄ Critical
Search Engine OpenSearch 2.19 with BM25 + KNN hybrid, RRF fusion FAISS (vector-only, no filtering) πŸ”΄ Critical
Chunking Section-aware chunking (600w, 100w overlap, metadata) Basic RecursiveCharacterTextSplitter (1000 char) 🟑 Major
Embeddings Jina AI v3 (1024d, passage/query differentiation) HuggingFace MiniLM (384d) or Google free tier 🟑 Major
Data Pipeline Airflow DAGs (daily schedule, fetchβ†’parseβ†’chunkβ†’index) Manual PDF loading, one-time setup 🟑 Major
Caching Redis with TTL, exact-match, SHA256 keys None 🟑 Major
Observability Langfuse v3 (traces, spans, generations, cost tracking) None (print statements only) 🟑 Major
Streaming SSE streaming with Gradio UI None (synchronous responses) 🟑 Major
Agentic RAG LangGraph with guardrails, grading, rewriting, context_schema Basic LangGraph (no guardrails, no grading) 🟑 Major
Bot Integration Telegram bot with /search, Q&A, caching None 🟒 Enhancement
Config Management Pydantic Settings, hierarchical env vars, frozen models Basic os.getenv, dotenv 🟑 Major
Dependency Injection FastAPI Depends() with typed annotations Manual global singletons 🟑 Major
Error Handling Domain exception hierarchy, graceful fallbacks Basic try/except with prints 🟑 Major
Code Quality Ruff, MyPy, pre-commit, pytest with fixtures Minimal pytest, no linting 🟒 Enhancement
API Design Versioned (/api/v1/), health checks for all services Basic routes, minimal health check 🟑 Major

Phase 1: Infrastructure Foundation (Week 1 Equivalent)

Goal: Containerize everything, add PostgreSQL for persistence, set up OpenSearch, establish professional development environment.

1.1 Docker Compose Orchestration

Create a production docker-compose.yml with all services:

# Target services for MediGuard AI:
services:
  api:           # FastAPI application (port 8000)
  postgres:      # Patient reports, analysis history (port 5432)
  opensearch:    # Medical document search engine (port 9200)
  opensearch-dashboards:  # Search UI (port 5601)
  redis:         # Response caching (port 6379)
  ollama:        # Local LLM for privacy-sensitive medical data (port 11434)
  airflow:       # Medical literature pipeline (port 8080)
  langfuse-web:  # Observability dashboard (port 3001)
  langfuse-worker/postgres/redis/clickhouse/minio:  # Langfuse infra

Tasks:

  • Create root docker-compose.yml adapting course pattern to MedTech services
  • Create multi-stage Dockerfile using UV package manager (copy course pattern)
  • Add health checks for every service (PostgreSQL, OpenSearch, Redis, Ollama)
  • Set up Docker network mediguard-network with proper service dependencies
  • Configure volume persistence for all data stores
  • Create .env.example with all configuration variables documented

1.2 Pydantic Settings Configuration

Replace scattered os.getenv() calls with hierarchical Pydantic Settings:

# New: src/config.py (course-inspired)
class MedicalPDFSettings(BaseConfigSettings):    # PDF parser config
class ChunkingSettings(BaseConfigSettings):       # Chunking parameters  
class OpenSearchSettings(BaseConfigSettings):     # Search engine config
class LangfuseSettings(BaseConfigSettings):       # Observability config
class RedisSettings(BaseConfigSettings):          # Cache config
class TelegramSettings(BaseConfigSettings):       # Bot config
class BiomarkerSettings(BaseConfigSettings):      # Biomarker thresholds
class Settings(BaseConfigSettings):               # Root settings

Tasks:

  • Rewrite src/config.py β€” keep ExplanationSOP but add infrastructure settings classes
  • Use env_nested_delimiter="__" for hierarchical environment variables
  • Add frozen=True for immutable configuration
  • Move all hardcoded values to environment variables with sensible defaults
  • Create get_settings() factory with @lru_cache

1.3 PostgreSQL Database Setup

Add persistent storage for analysis history β€” critical for medical audit trail:

# New models:
class PatientAnalysis(Base):      # Store each analysis run
class AnalysisReport(Base):       # Store final reports
class MedicalDocument(Base):      # Track ingested medical PDFs
class BiomarkerReference(Base):   # Biomarker reference ranges (currently JSON file)

Tasks:

  • Create src/db/ package mirroring course pattern (factory, interfaces, postgresql)
  • Define SQLAlchemy models for analysis history and medical documents
  • Create repository pattern for data access
  • Set up Alembic for database migrations
  • Migrate biomarker_references.json to database (keep JSON as seed data)

1.4 Project Structure Refactor

Reorganize to match production patterns:

src/
β”œβ”€β”€ config.py                    # Pydantic Settings (hierarchical)
β”œβ”€β”€ main.py                      # FastAPI app with lifespan
β”œβ”€β”€ database.py                  # Database utilities
β”œβ”€β”€ dependencies.py              # FastAPI dependency injection
β”œβ”€β”€ exceptions.py                # Domain exception hierarchy
β”œβ”€β”€ middlewares.py               # Request logging, timing
β”œβ”€β”€ db/                          # Database layer
β”‚   β”œβ”€β”€ factory.py
β”‚   └── interfaces/
β”œβ”€β”€ models/                      # SQLAlchemy models
β”‚   β”œβ”€β”€ analysis.py
β”‚   └── document.py  
β”œβ”€β”€ repositories/                # Data access
β”‚   β”œβ”€β”€ analysis.py
β”‚   └── document.py
β”œβ”€β”€ routers/                     # API endpoints
β”‚   β”œβ”€β”€ analyze.py               # Biomarker analysis
β”‚   β”œβ”€β”€ ask.py                   # RAG Q&A (streaming + standard)
β”‚   β”œβ”€β”€ health.py                # Comprehensive health checks
β”‚   └── search.py                # Medical document search
β”œβ”€β”€ schemas/                     # Pydantic request/response models
β”‚   β”œβ”€β”€ api/
β”‚   β”œβ”€β”€ medical/
β”‚   └── embeddings/
β”œβ”€β”€ services/                    # Business logic
β”‚   β”œβ”€β”€ agents/                  # Your 6 medical agents (KEEP!)
β”‚   β”‚   β”œβ”€β”€ biomarker_analyzer.py
β”‚   β”‚   β”œβ”€β”€ disease_explainer.py
β”‚   β”‚   β”œβ”€β”€ biomarker_linker.py
β”‚   β”‚   β”œβ”€β”€ clinical_guidelines.py
β”‚   β”‚   β”œβ”€β”€ confidence_assessor.py
β”‚   β”‚   β”œβ”€β”€ response_synthesizer.py
β”‚   β”‚   β”œβ”€β”€ agentic_rag.py       # NEW: LangGraph agentic wrapper
β”‚   β”‚   β”œβ”€β”€ nodes/               # NEW: Guardrail, grading, rewriting
β”‚   β”‚   β”œβ”€β”€ state.py             # Enhanced state
β”‚   β”‚   β”œβ”€β”€ context.py           # Runtime dependency injection
β”‚   β”‚   └── prompts.py           # Medical-domain prompts
β”‚   β”œβ”€β”€ opensearch/              # NEW: Search engine client
β”‚   β”œβ”€β”€ embeddings/              # NEW: Production embeddings
β”‚   β”œβ”€β”€ cache/                   # NEW: Redis caching
β”‚   β”œβ”€β”€ langfuse/                # NEW: Observability
β”‚   β”œβ”€β”€ ollama/                  # NEW: Local LLM client
β”‚   β”œβ”€β”€ indexing/                # NEW: Chunking + indexing
β”‚   β”œβ”€β”€ pdf_parser/              # Enhanced: Use Docling
β”‚   β”œβ”€β”€ telegram/                # NEW: Bot integration
β”‚   └── biomarker/               # Extracted: validation + normalization
β”œβ”€β”€ evaluation/                  # KEEP: 5D evaluation
└── evolution/                   # KEEP: SOP evolution

Tasks:

  • Create the new directory structure
  • Move API from api/app/ into src/ (single application)
  • Create exceptions.py with medical-domain exception hierarchy
  • Create dependencies.py with typed FastAPI dependency injection
  • Create main.py with proper lifespan context manager

1.5 Development Tooling

Tasks:

  • Create pyproject.toml replacing requirements.txt (use UV)
  • Create Makefile with start/stop/test/lint/format/health commands
  • Add ruff for linting and formatting
  • Add mypy for type checking
  • Add .pre-commit-config.yaml
  • Create .env.example and .env.test

Phase 2: Medical Data Ingestion Pipeline (Week 2 Equivalent)

Goal: Automated ingestion of medical PDFs, clinical guidelines, and reference documents with Airflow orchestration.

2.1 Medical PDF Parser Upgrade

Replace basic PyPDF with Docling for better medical document handling:

Tasks:

  • Create src/services/pdf_parser/ with Docling integration (copy course pattern)
  • Add medical-specific section detection (Abstract, Methods, Results, Discussion, Clinical Guidelines)
  • Add table extraction for lab reference ranges
  • Add validation: file size limits, page limits, PDF header check
  • Add metadata extraction: title, authors, publication date, journal

2.2 Medical Document Sources

Unlike arXiv (single API), medical literature comes from multiple sources:

Tasks:

  • Create src/services/medical_sources/ package
  • Implement PubMed API client (free, rate-limited) for research papers
  • Implement local PDF upload endpoint for clinical guidelines
  • Implement reference document ingestion (WHO, CDC, ADA guidelines)
  • Create document deduplication logic (by title hash + content fingerprint)
  • Add MedicalDocument model tracking: source, parse status, indexing status

2.3 Airflow Pipeline for Medical Literature

Tasks:

  • Create airflow/ directory with Dockerfile and entrypoint
  • Create airflow/dags/medical_ingestion.py DAG:
    • setup_environment β†’ fetch_new_documents β†’ parse_pdfs β†’ chunk_and_index β†’ generate_report
  • Schedule: Daily at 6 AM for PubMed updates, on-demand for uploaded PDFs
  • Add retry logic with exponential backoff
  • Mount src/ into Airflow container for shared code

2.4 PostgreSQL Storage for Documents

Tasks:

  • Create MedicalDocument model: id, title, source, source_type, authors, abstract, raw_text, sections, parse_status, indexed_at
  • Create PaperRepository with CRUD + upsert + status tracking
  • Track processing pipeline: uploaded β†’ parsed β†’ chunked β†’ indexed
  • Store parsed sections as JSON for re-indexing without re-parsing

Phase 3: Production Search Foundation (Week 3 Equivalent)

Goal: Replace FAISS with OpenSearch for production BM25 keyword search with medical-specific optimizations.

3.1 OpenSearch Client

Tasks:

  • Create src/services/opensearch/ package (adapt course pattern)
  • Implement OpenSearchClient with:
    • Health check, index management, BM25 search, bulk indexing
    • Medical-specific: Boost clinical term matches, support ICD-10 code filtering
  • Create QueryBuilder with medical field boosting:
    fields: ["chunk_text^3", "title^2", "section_title^1.5", "abstract^1"]
    
  • Create index_config_hybrid.py with medical document mapping:
    • Fields: chunk_text, title, authors, abstract, document_type (guideline/research/reference), condition_tags, publication_year

3.2 Medical Document Index Mapping

MEDICAL_CHUNKS_MAPPING = {
    "settings": {
        "index.knn": True,
        "analysis": {
            "analyzer": {
                "medical_analyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": ["lowercase", "medical_synonyms", "stop", "snowball"]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "chunk_text": {"type": "text", "analyzer": "medical_analyzer"},
            "document_type": {"type": "keyword"},  # guideline, research, reference
            "condition_tags": {"type": "keyword"},  # diabetes, anemia, etc.
            "biomarkers_mentioned": {"type": "keyword"},  # Glucose, HbA1c, etc.
            "embedding": {"type": "knn_vector", "dimension": 1024},
            # ... more fields
        }
    }
}

Tasks:

  • Design medical-optimized OpenSearch mapping
  • Add medical synonym analyzer (e.g., "diabetes mellitus" ↔ "DM", "HbA1c" ↔ "glycated hemoglobin")
  • Create search endpoint POST /api/v1/search with filtering by document_type, condition_tags
  • Implement BM25 search with medical field boosting
  • Create index verification in startup lifespan

Phase 4: Hybrid Search & Intelligent Chunking (Week 4 Equivalent)

Goal: Section-aware chunking for medical documents + hybrid search (BM25 + semantic) with RRF fusion.

4.1 Medical-Aware Text Chunking

Tasks:

  • Create src/services/indexing/text_chunker.py adapting course's TextChunker:
    • Section-aware chunking (detect: Introduction, Methods, Results, Discussion, Guidelines, References)
    • Target: 600 words per chunk, 100 word overlap
    • Medical metadata: section_title, biomarkers_mentioned, condition_tags
  • Create MedicalTextChunker subclass with:
    • Biomarker mention detection (scan for any of 24+ biomarker names)
    • Condition tag extraction (diabetes, anemia, heart disease, etc.)
    • Table-aware chunking (keep tables together)
    • Reference section filtering (skip bibliography chunks)
  • Create HybridIndexingService for chunk β†’ embed β†’ index pipeline

4.2 Production Embeddings

Tasks:

  • Create src/services/embeddings/ with Jina AI client (1024d, passage/query differentiation)
  • Add fallback chain: Jina β†’ Google β†’ HuggingFace
  • Implement batch embedding for efficient indexing
  • Track embedding model in chunk metadata for versioning

4.3 Hybrid Search with RRF

Tasks:

  • Implement search_unified() supporting: BM25-only, vector-only, hybrid modes
  • Set up OpenSearch RRF (Reciprocal Rank Fusion) pipeline
  • Create unified search endpoint POST /api/v1/hybrid-search/
  • Add min_score filtering and result deduplication
  • Benchmark: BM25 vs. vector vs. hybrid on medical queries

Phase 5: Complete RAG Pipeline with Streaming (Week 5 Equivalent)

Goal: Replace synchronous analysis with streaming RAG, add Gradio UI, optimize prompts.

5.1 Ollama Client Upgrade

Tasks:

  • Create src/services/ollama/ package (adapt course pattern)
  • Implement OllamaClient with:
    • Health check, model listing, generate, streaming generate
    • Usage metadata extraction (tokens, latency)
    • LangChain integration: get_langchain_model() for structured output
  • Create medical-specific RAG prompt templates:
    • rag_medical_system.txt β€” optimized for medical explanation generation
    • Structured output format for clinical responses
  • Create OllamaFactory with @lru_cache

5.2 Streaming RAG Endpoints

Tasks:

  • Create POST /api/v1/ask β€” standard RAG with medical context retrieval
  • Create POST /api/v1/stream β€” SSE streaming for real-time responses
  • Create POST /api/v1/analyze/stream β€” streaming biomarker analysis
  • Integrate with existing multi-agent pipeline:
    Query β†’ Hybrid Search β†’ Medical Chunks β†’ Agent Pipeline β†’ Streaming Response
    

5.3 Gradio Medical Interface

Tasks:

  • Create src/gradio_app.py for interactive medical RAG:
    • Biomarker input form (structured entry)
    • Natural language input (free text)
    • Streaming response display
    • Search mode selector (BM25, hybrid, vector)
    • Model selector
    • Analysis history display
  • Create gradio_launcher.py for easy startup
  • Expose on port 7861

5.4 Prompt Optimization

Tasks:

  • Reduce prompt size by 60-80% (course achieved 80% reduction)
  • Create focused medical prompts (separate: biomarker analysis, disease explanation, guidelines)
  • Test prompt variants using 5D evaluation framework
  • Store best prompts as SOP parameters (tie into evolution engine)

Phase 6: Monitoring, Caching & Observability (Week 6 Equivalent)

Goal: Add Langfuse tracing for the entire pipeline, Redis caching, and production monitoring.

6.1 Langfuse Integration

Tasks:

  • Create src/services/langfuse/ package (adapt course pattern):
    • client.py β€” LangfuseTracer wrapper with v3 SDK
    • factory.py β€” cached tracer factory
    • tracer.py β€” medical-specific RAGTracer with named steps
  • Add spans for every pipeline step:
    • biomarker_validation β†’ query_embedding β†’ search_retrieval β†’ agent_execution β†’ response_synthesis
  • Track per-request metrics:
    • Total latency, LLM tokens used, search results count, cache hit/miss, agent execution time
  • Add Langfuse Docker services to docker-compose.yml
  • Create trace visualization for medical analysis pipeline

6.2 Redis Caching

Tasks:

  • Create src/services/cache/ package (adapt course pattern):
    • Exact-match cache: SHA256(query + model + top_k + biomarkers) β†’ cached response
    • TTL: 6 hours for general queries, 1 hour for biomarker analysis (values may change)
  • Add caching to:
    • /api/v1/ask β€” cache RAG responses
    • /api/v1/analyze β€” cache full analysis results
    • Embeddings β€” cache frequently queried embeddings
  • Add graceful fallback: cache miss β†’ normal pipeline
  • Track cache hit rates in Langfuse

6.3 Production Health Dashboard

Tasks:

  • Enhance /api/v1/health to check all services:
    • PostgreSQL, OpenSearch, Redis, Ollama, Langfuse, Airflow
  • Add /api/v1/metrics endpoint for operational metrics
  • Create Langfuse dashboard for:
    • Average response time, cache hit rate, error rate, token costs
    • Per-agent execution times, search relevance scores

Phase 7: Agentic RAG & Messaging Bot (Week 7 Equivalent)

Goal: Wrap your multi-agent pipeline in a LangGraph agentic workflow with guardrails, document grading, and query rewriting. Add Telegram bot for mobile access.

7.1 Agentic RAG Wrapper

This is the most impactful upgrade β€” it adds intelligence around your existing agents:

User Query
    ↓
[GUARDRAIL] ──── Is this a medical/biomarker question? ────→ [OUT OF SCOPE]
    ↓ yes
[RETRIEVE] ──── Hybrid search for medical documents ────→ [TOOL: search]
    ↓
[GRADE DOCUMENTS] ──── Are results relevant? ────→ [REWRITE QUERY] ──→ loop
    ↓ yes
[CLINICAL ANALYSIS] ──── Your 6 medical agents ────→ structured analysis
    ↓
[GENERATE RESPONSE] ──── Synthesize with citations ────→ final answer

Tasks:

  • Create src/services/agents/agentic_rag.py β€” AgenticRAGService class
  • Create src/services/agents/nodes/:
    • guardrail_node.py β€” Medical domain validation (score 0-100)
      • In-scope: biomarker questions, disease queries, clinical guidelines
      • Out-of-scope: non-medical, general knowledge, harmful content
    • retrieve_node.py β€” Creates tool call with max_retrieval_attempts
    • grade_documents_node.py β€” LLM evaluates medical relevance
    • rewrite_query_node.py β€” LLM rewrites for better medical retrieval
    • generate_answer_node.py β€” Uses your existing agent pipeline OR direct LLM
    • out_of_scope_node.py β€” Polite medical-domain rejection
  • Create src/services/agents/state.py β€” Enhanced state with guardrail_result, routing_decision, grading_results
  • Create src/services/agents/context.py β€” Runtime context for dependency injection
  • Create src/services/agents/prompts.py β€” Medical-specific prompts:
    • Guardrail: "Is this about health/biomarkers/medical conditions?"
    • Grading: "Does this medical document answer the clinical question?"
    • Rewriting: "Improve this medical query for better document retrieval"
    • Generation: "Synthesize medical findings with citations and safety caveats"
  • Create src/services/agents/tools.py β€” Medical retriever tool wrapping OpenSearch
  • Create POST /api/v1/ask-agentic endpoint
  • Add Langfuse tracing to every node

7.2 Medical Guardrails (Critical for MedTech)

Beyond the course's simple domain check, add medical-specific safety:

Tasks:

  • Input guardrails:
    • Detect harmful queries (self-harm, drug abuse guidance)
    • Detect attempts to get diagnosis without proper data
    • Validate biomarker values are physiologically plausible
  • Output guardrails:
    • Always include "consult your healthcare provider" disclaimer
    • Never provide definitive diagnosis (always "suggests" / "may indicate")
    • Flag critical biomarker values with immediate action advice
    • Ensure safety_alerts are present for out-of-range values
  • Citation guardrails:
    • Ensure all medical claims have document citations
    • Flag unsupported claims

7.3 Telegram Bot Integration

Tasks:

  • Create src/services/telegram/ package (adapt course pattern)
  • Implement bot commands:
    • /start β€” Welcome with medical assistant introduction
    • /help β€” Show capabilities and input format
    • /analyze <biomarker values> β€” Quick biomarker analysis
    • /search <medical query> β€” Search medical documents
    • /report β€” Get last analysis as formatted report
    • Free text β€” Full RAG Q&A about medical topics
  • Add typing indicators and progress messages
  • Integrate caching for repeated queries
  • Add rate limiting (medical queries shouldn't be spammed)
  • Create TelegramFactory gated by TELEGRAM__ENABLED=true

7.4 Feedback Loop

Tasks:

  • Create POST /api/v1/feedback endpoint (adapt from course)
  • Integrate with Langfuse scoring
  • Use feedback data to identify weak prompts β†’ feed into SOP evolution engine

Phase 8: MedTech-Specific Additions (Beyond Course)

Goal: Things the course doesn't cover but your medical domain demands.

8.1 HIPAA-Awareness Patterns

Tasks:

  • Never log patient biomarker values in plain text
  • Add request ID tracking without PII
  • Create data retention policy (auto-delete analysis data after configurable period)
  • Add audit logging for all analysis requests
  • Document HIPAA compliance approach (even if not yet certified)

8.2 Medical Safety Testing

Tasks:

  • Create medical-specific test suite:
    • Critical value detection tests (every critical biomarker)
    • Guardrail rejection tests (non-medical queries)
    • Citation completeness tests
    • Safety disclaimer presence tests
    • Biomarker normalization tests (already have some)
  • Integrate 5D evaluation into CI pipeline
  • Create test fixtures with realistic medical scenarios

8.3 Evolution Engine Integration

Tasks:

  • Wire SOP evolution engine to production metrics (Langfuse data)
  • Create Airflow DAG for scheduled evolution cycles
  • Store evolved SOPs in PostgreSQL with version tracking
  • A/B test SOP variants using Langfuse trace comparison

8.4 Multi-condition Support

Tasks:

  • Extend condition coverage beyond current 5 diseases
  • Add condition-specific retrieval strategies
  • Create condition-specific chunking filters
  • Support multi-condition analysis (comorbidities)

Implementation Priority Matrix

Priority Phase Effort Impact Dependencies
πŸ”΄ P0 1.1 Docker Compose 2 days Critical None
πŸ”΄ P0 1.2 Pydantic Settings 1 day Critical None
πŸ”΄ P0 1.4 Project Restructure 2 days Critical None
πŸ”΄ P0 1.5 Dev Tooling 0.5 day Critical 1.4
πŸ”΄ P0 1.3 PostgreSQL + Models 2 days Critical 1.1, 1.4
🟑 P1 3.1 OpenSearch Client 2 days High 1.1, 1.4
🟑 P1 3.2 Medical Index Mapping 1 day High 3.1
🟑 P1 4.1 Medical Text Chunker 2 days High 3.1
🟑 P1 4.2 Production Embeddings 1 day High 4.1
🟑 P1 4.3 Hybrid Search + RRF 1 day High 3.1, 4.2
🟑 P1 5.1 Ollama Client 1 day High 1.4
🟑 P1 5.2 Streaming Endpoints 1 day High 5.1, 4.3
🟑 P1 2.1 PDF Parser (Docling) 1 day High 1.4
🟑 P1 7.1 Agentic RAG Wrapper 3 days High 5.2, 4.3
🟑 P1 7.2 Medical Guardrails 2 days High 7.1
🟒 P2 2.3 Airflow Pipeline 2 days Medium 1.1, 2.1, 4.1
🟒 P2 5.3 Gradio Interface 1 day Medium 5.2
🟒 P2 6.1 Langfuse Tracing 2 days Medium 1.1, 5.2
🟒 P2 6.2 Redis Caching 1 day Medium 1.1, 5.2
🟒 P2 6.3 Health Dashboard 0.5 day Medium 6.1
🟒 P2 7.3 Telegram Bot 2 days Medium 7.1, 6.2
🟒 P2 7.4 Feedback Loop 0.5 day Medium 6.1
πŸ”΅ P3 2.2 Medical Sources 2 days Low 2.1
πŸ”΅ P3 8.1 HIPAA Patterns 1 day Low 1.3
πŸ”΅ P3 8.2 Safety Testing 2 days Low 7.2
πŸ”΅ P3 8.3 Evolution Integration 2 days Low 6.1, 2.3
πŸ”΅ P3 8.4 Multi-condition 3 days Low 4.1

Estimated Total: ~40 days of focused work


Migration Strategy

Step 1: Foundation (Week 1-2 of work)

  1. Restructure project layout β†’ Phase 1.4
  2. Create Pydantic Settings β†’ Phase 1.2
  3. Set up Docker Compose β†’ Phase 1.1
  4. Add PostgreSQL with models β†’ Phase 1.3
  5. Add dev tooling β†’ Phase 1.5

Step 2: Search Engine (Week 2-3)

  1. Create OpenSearch client + medical mapping β†’ Phase 3.1, 3.2
  2. Build medical text chunker β†’ Phase 4.1
  3. Add production embeddings (Jina) β†’ Phase 4.2
  4. Implement hybrid search + RRF β†’ Phase 4.3
  5. Upgrade PDF parser to Docling β†’ Phase 2.1

Step 3: RAG Pipeline (Week 3-4)

  1. Create Ollama client β†’ Phase 5.1
  2. Add streaming endpoints β†’ Phase 5.2
  3. Build agentic RAG wrapper β†’ Phase 7.1
  4. Add medical guardrails β†’ Phase 7.2
  5. Create Gradio interface β†’ Phase 5.3

Step 4: Production Hardening (Week 4-5)

  1. Add Langfuse observability β†’ Phase 6.1
  2. Add Redis caching β†’ Phase 6.2
  3. Set up Airflow pipeline β†’ Phase 2.3
  4. Build Telegram bot β†’ Phase 7.3
  5. Add feedback loop β†’ Phase 7.4

Step 5: Polish (Week 5-6)

  1. Health dashboard β†’ Phase 6.3
  2. Medical safety testing β†’ Phase 8.2
  3. HIPAA patterns β†’ Phase 8.1
  4. Evolution engine integration β†’ Phase 8.3

Key Migration Rules

  • Never break what works: Keep all existing agents functional throughout
  • Test at every step: Run existing tests after each phase
  • Incremental Docker: Start with API + PostgreSQL, add services one at a time
  • Feature flags: Gate new features (Telegram, Langfuse, Redis) behind settings
  • Backward compatibility: Keep CLI chatbot working alongside new API

Architecture Target State

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Docker Compose Orchestration                         β”‚
β”‚                                                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ FastAPI   β”‚  β”‚PostgreSQL β”‚  β”‚ OpenSearch β”‚  β”‚ Ollama β”‚  β”‚ Airflow β”‚  β”‚
β”‚  β”‚ + Gradio  β”‚  β”‚ (reports, β”‚  β”‚ (hybrid   β”‚  β”‚ (local β”‚  β”‚ (daily  β”‚  β”‚
β”‚  β”‚ (8000,    β”‚  β”‚  docs,    β”‚  β”‚  medical  β”‚  β”‚  LLM)  β”‚  β”‚ ingest) β”‚  β”‚
β”‚  β”‚  7861)    β”‚  β”‚  history) β”‚  β”‚  search)  β”‚  β”‚        β”‚  β”‚         β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β”‚
β”‚       β”‚              β”‚              β”‚             β”‚            β”‚        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”    β”‚
β”‚  β”‚  Redis   β”‚  β”‚ Langfuse  β”‚  β”‚        mediguard-network         β”‚    β”‚
β”‚  β”‚ (cache)  β”‚  β”‚ (observe) β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                          β”‚
β”‚                                                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                    Agentic RAG Pipeline                            β”‚  β”‚
β”‚  β”‚                                                                    β”‚  β”‚
β”‚  β”‚  Query β†’ [Guardrail] β†’ [Retrieve] β†’ [Grade] β†’ [6 Medical Agents] β”‚  β”‚
β”‚  β”‚              ↓              ↑          ↓              ↓            β”‚  β”‚
β”‚  β”‚        [Out of Scope]  [Rewrite]  [Generate]  β†’ Final Response    β”‚  β”‚
β”‚  β”‚                                                                    β”‚  β”‚
β”‚  β”‚  Agents: Biomarker Analyzer β”‚ Disease Explainer β”‚ Linker          β”‚  β”‚
β”‚  β”‚          Clinical Guidelines β”‚ Confidence β”‚ Synthesizer           β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Telegram Bot β”‚  β”‚  Gradio UI   β”‚  β”‚  5D Eval + SOP Evolution     β”‚  β”‚
β”‚  β”‚ (mobile)     β”‚  β”‚  (desktop)   β”‚  β”‚  (self-improvement loop)     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Files to Create (Summary)

New File Source of Inspiration
docker-compose.yml Course compose.yml (adapted)
Dockerfile Course Dockerfile (multi-stage UV)
Makefile Course Makefile
pyproject.toml Course pyproject.toml
.pre-commit-config.yaml Course .pre-commit-config.yaml
.env.example Course .env.example
src/main.py Course src/main.py (lifespan pattern)
src/config.py Course src/config.py + existing SOP config
src/dependencies.py Course src/dependencies.py
src/exceptions.py Course src/exceptions.py (medical exceptions)
src/database.py Course src/database.py
src/db/* Course src/db/*
src/models/analysis.py New (medical domain)
src/models/document.py Course src/models/paper.py (adapted)
src/repositories/* Course src/repositories/* (adapted)
src/routers/ask.py Course src/routers/ask.py
src/routers/search.py Course src/routers/hybrid_search.py
src/routers/health.py Course src/routers/ping.py (enhanced)
src/schemas/* Course src/schemas/* (medical schemas)
src/services/opensearch/* Course src/services/opensearch/*
src/services/embeddings/* Course src/services/embeddings/*
src/services/ollama/* Course src/services/ollama/*
src/services/cache/* Course src/services/cache/*
src/services/langfuse/* Course src/services/langfuse/*
src/services/indexing/* Course src/services/indexing/* (medical chunks)
src/services/pdf_parser/* Course src/services/pdf_parser/*
src/services/telegram/* Course src/services/telegram/*
src/services/agents/agentic_rag.py Course (adapted for medical agents)
src/services/agents/nodes/* Course (medical guardrails)
src/services/agents/context.py Course
src/services/agents/prompts.py Course (medical prompts)
src/gradio_app.py Course src/gradio_app.py (medical UI)
airflow/dags/medical_ingestion.py Course airflow/dags/arxiv_paper_ingestion.py

Files to Keep & Enhance

Existing File Action
src/agents/biomarker_analyzer.py Keep, move to src/services/agents/medical/
src/agents/disease_explainer.py Keep, move, add OpenSearch retriever
src/agents/biomarker_linker.py Keep, move, add OpenSearch retriever
src/agents/clinical_guidelines.py Keep, move, add OpenSearch retriever
src/agents/confidence_assessor.py Keep, move
src/agents/response_synthesizer.py Keep, move
src/biomarker_validator.py Keep, move to src/services/biomarker/
src/biomarker_normalization.py Keep, move to src/services/biomarker/
src/evaluation/ Keep, enhance with Langfuse integration
src/evolution/ Keep, wire to production metrics
config/biomarker_references.json Keep as seed data, migrate to DB
scripts/chat.py Keep, update imports
tests/* Keep, add production test fixtures

This plan transforms MediGuard AI from a working prototype into a production-grade medical RAG system, applying every infrastructure lesson from the arXiv Paper Curator course while preserving and enhancing your unique medical domain logic.