Spaces:

T0X1N
/

Agentic-RagBot

Sleeping

App Files Files Community

Agentic-RagBot / docs /plans /PRODUCTION_UPGRADE_PLAN.md

Nikhil Pravin Pise

Refactor: Improve code quality, security, and configuration

ad2e847 19 days ago

preview code

raw

history blame contribute delete

42.6 kB

	# MediGuard AI — Production Upgrade Plan

	## From Prototype to Production-Grade MedTech RAG System

	> Generated: 2026-02-23
	> Based on: Deep review of production-agentic-rag-course (Weeks 1–7) + existing RagBot codebase
	> Goal: Take the existing MediGuard AI (clinical biomarker analysis + RAG explanation system) to full production quality, applying every lesson from the arXiv Paper Curator course — adapted for the MedTech domain.

	---

	## Table of Contents

	1. [Executive Summary](#1-executive-summary)
	2. [Deep Review: Course vs. Your Codebase](#2-deep-review-course-vs-your-codebase)
	3. [Architecture Gap Analysis](#3-architecture-gap-analysis)
	4. [Phase 1: Infrastructure Foundation](#phase-1-infrastructure-foundation-week-1-equivalent)
	5. [Phase 2: Medical Data Ingestion Pipeline](#phase-2-medical-data-ingestion-pipeline-week-2-equivalent)
	6. [Phase 3: Production Search Foundation](#phase-3-production-search-foundation-week-3-equivalent)
	7. [Phase 4: Hybrid Search & Intelligent Chunking](#phase-4-hybrid-search--intelligent-chunking-week-4-equivalent)
	8. [Phase 5: Complete RAG Pipeline with Streaming](#phase-5-complete-rag-pipeline-with-streaming-week-5-equivalent)
	9. [Phase 6: Monitoring, Caching & Observability](#phase-6-monitoring-caching--observability-week-6-equivalent)
	10. [Phase 7: Agentic RAG & Messaging Bot](#phase-7-agentic-rag--messaging-bot-week-7-equivalent)
	11. [Phase 8: MedTech-Specific Additions](#phase-8-medtech-specific-additions-beyond-course)
	12. [Implementation Priority Matrix](#implementation-priority-matrix)
	13. [Migration Strategy](#migration-strategy)

	---

	## 1. Executive Summary

	Your RagBot is a working prototype with strong domain logic (biomarker validation, multi-agent clinical analysis, 5D evaluation, SOP evolution). The course teaches production infrastructure (Docker orchestration, OpenSearch hybrid search, Airflow pipelines, Redis caching, Langfuse observability, LangGraph agentic workflows, Telegram bot).

	The strategy: Keep your excellent medical domain logic and multi-agent architecture, but rebuild the infrastructure layer to match production standards. Your domain is harder than arXiv papers — medical data demands stricter validation, HIPAA-aware patterns, and safety guardrails.

	### What You Have (Strengths)
	- ✅ 6 specialized medical agents (Biomarker Analyzer, Disease Explainer, Biomarker-Disease Linker, Clinical Guidelines, Confidence Assessor, Response Synthesizer)
	- ✅ LangGraph orchestration with parallel execution
	- ✅ Robust biomarker validation with 24 biomarkers, reference ranges, critical values
	- ✅ 5D evaluation framework (Clinical Accuracy, Evidence Grounding, Actionability, Clarity, Safety)
	- ✅ SOP evolution engine (Outer Loop optimization)
	- ✅ Multi-provider LLM support (Groq, Gemini, Ollama)
	- ✅ Basic FastAPI with analysis endpoints
	- ✅ CLI chatbot with natural language biomarker extraction

	### What You're Missing (Gaps)
	- ❌ No Docker Compose orchestration (only minimal single-service Dockerfile)
	- ❌ No production database (PostgreSQL) — no patient/report persistence
	- ❌ No production search engine — using FAISS (in-memory, single-file, no filtering)
	- ❌ No chunking strategy — basic RecursiveCharacterTextSplitter only
	- ❌ No hybrid search (BM25 + vector) — vector-only retrieval
	- ❌ No production embeddings — using local HuggingFace MiniLM (384d) or Google free tier
	- ❌ No data ingestion pipeline (Airflow) — manual PDF loading
	- ❌ No caching layer (Redis) — every query hits LLM
	- ❌ No observability (Langfuse) — no tracing, no cost tracking
	- ❌ No streaming responses — synchronous only
	- ❌ No Gradio interface — CLI only (besides basic API)
	- ❌ No messaging bot (Telegram/WhatsApp) — no mobile access
	- ❌ No agentic RAG with guardrails, document grading, query rewriting
	- ❌ No proper dependency injection pattern (FastAPI `Depends()`)
	- ❌ No Pydantic Settings with env-nested config
	- ❌ No factory pattern for service initialization
	- ❌ No proper exception hierarchy
	- ❌ No health checks for all services
	- ❌ No Makefile / dev tooling (ruff, mypy, pre-commit)
	- ❌ No proper test infrastructure (pytest fixtures, test containers)

	---

	## 2. Deep Review: Course vs. Your Codebase

	### Course Architecture (What Production Looks Like)

	```
	┌──────────────────────────────────────────────────────────────┐
	│ Docker Compose Orchestration │
	├──────────┬──────────┬──────────┬──────────┬─────────────────┤
	│ FastAPI │PostgreSQL│OpenSearch│ Ollama │ Airflow │
	│ (8000) │ (5432) │ (9200) │ (11434) │ (8080) │
	├──────────┼──────────┼──────────┼──────────┼─────────────────┤
	│ Redis │ Langfuse │ClickHouse│ MinIO │ Langfuse-PG │
	│ (6379) │ (3001) │ │ │ (5433) │
	├──────────┴──────────┴──────────┴──────────┴─────────────────┤
	│ Gradio UI (7861) │ Telegram Bot │
	└──────────────────────────────────────────────────────────────┘
	```

	Key Patterns from Course:
	- Pydantic Settings with `env_nested_delimiter="__"` for hierarchical config
	- Factory pattern (`make_*` functions) for every service
	- Dependency injection via FastAPI `Depends()` with typed annotations
	- Lifespan context for startup/shutdown with proper resource management
	- Service layer separation: `routers/` → `services/` → `clients/`
	- Schema-driven: Separate Pydantic schemas for API, database, embeddings, indexing
	- Exception hierarchy: Domain-specific exceptions (`PDFParsingException`, `OllamaException`, etc.)
	- Context dataclass for LangGraph runtime dependency injection
	- Structured LLM output via `.with_structured_output(PydanticModel)`

	### Your Codebase Architecture (Current State)

	```
	┌─────────────────────────────────────────────┐
	│ Basic FastAPI (api/app/) │
	│ Single Dockerfile, no orchestration │
	├─────────────────────────────────────────────┤
	│ src/ (Core Domain Logic) │
	│ ┌─────────────────────────────────────┐ │
	│ │ workflow.py (LangGraph StateGraph) │ │
	│ │ 6 agents/ (parallel execution) │ │
	│ │ biomarker_validator.py (24 markers) │ │
	│ │ pdf_processor.py (FAISS + PyPDF) │ │
	│ │ evaluation/ (5D framework) │ │
	│ │ evolution/ (SOP optimization) │ │
	│ └─────────────────────────────────────┘ │
	├─────────────────────────────────────────────┤
	│ FAISS vector store (single file) │
	│ No PostgreSQL, No Redis, No OpenSearch │
	└─────────────────────────────────────────────┘
	```

	---

	## 3. Architecture Gap Analysis

	\| Dimension \| Course (Production) \| Your Codebase (Prototype) \| Gap Severity \|
	\|-----------\|-------------------\|--------------------------\|--------------\|
	\| Container Orchestration \| Docker Compose with 12+ services, health checks, networks \| Single Dockerfile, manual startup \| 🔴 Critical \|
	\| Database \| PostgreSQL 16 with SQLAlchemy models, repositories \| None (in-memory only) \| 🔴 Critical \|
	\| Search Engine \| OpenSearch 2.19 with BM25 + KNN hybrid, RRF fusion \| FAISS (vector-only, no filtering) \| 🔴 Critical \|
	\| Chunking \| Section-aware chunking (600w, 100w overlap, metadata) \| Basic RecursiveCharacterTextSplitter (1000 char) \| 🟡 Major \|
	\| Embeddings \| Jina AI v3 (1024d, passage/query differentiation) \| HuggingFace MiniLM (384d) or Google free tier \| 🟡 Major \|
	\| Data Pipeline \| Airflow DAGs (daily schedule, fetch→parse→chunk→index) \| Manual PDF loading, one-time setup \| 🟡 Major \|
	\| Caching \| Redis with TTL, exact-match, SHA256 keys \| None \| 🟡 Major \|
	\| Observability \| Langfuse v3 (traces, spans, generations, cost tracking) \| None (print statements only) \| 🟡 Major \|
	\| Streaming \| SSE streaming with Gradio UI \| None (synchronous responses) \| 🟡 Major \|
	\| Agentic RAG \| LangGraph with guardrails, grading, rewriting, context_schema \| Basic LangGraph (no guardrails, no grading) \| 🟡 Major \|
	\| Bot Integration \| Telegram bot with /search, Q&A, caching \| None \| 🟢 Enhancement \|
	\| Config Management \| Pydantic Settings, hierarchical env vars, frozen models \| Basic os.getenv, dotenv \| 🟡 Major \|
	\| Dependency Injection \| FastAPI Depends() with typed annotations \| Manual global singletons \| 🟡 Major \|
	\| Error Handling \| Domain exception hierarchy, graceful fallbacks \| Basic try/except with prints \| 🟡 Major \|
	\| Code Quality \| Ruff, MyPy, pre-commit, pytest with fixtures \| Minimal pytest, no linting \| 🟢 Enhancement \|
	\| API Design \| Versioned (/api/v1/), health checks for all services \| Basic routes, minimal health check \| 🟡 Major \|

	---

	## Phase 1: Infrastructure Foundation (Week 1 Equivalent)

	> Goal: Containerize everything, add PostgreSQL for persistence, set up OpenSearch, establish professional development environment.

	### 1.1 Docker Compose Orchestration

	Create a production `docker-compose.yml` with all services:

	```yaml
	# Target services for MediGuard AI:
	services:
	api: # FastAPI application (port 8000)
	postgres: # Patient reports, analysis history (port 5432)
	opensearch: # Medical document search engine (port 9200)
	opensearch-dashboards: # Search UI (port 5601)
	redis: # Response caching (port 6379)
	ollama: # Local LLM for privacy-sensitive medical data (port 11434)
	airflow: # Medical literature pipeline (port 8080)
	langfuse-web: # Observability dashboard (port 3001)
	langfuse-worker/postgres/redis/clickhouse/minio: # Langfuse infra
	```

	Tasks:
	- [ ] Create root `docker-compose.yml` adapting course pattern to MedTech services
	- [ ] Create multi-stage `Dockerfile` using UV package manager (copy course pattern)
	- [ ] Add health checks for every service (PostgreSQL, OpenSearch, Redis, Ollama)
	- [ ] Set up Docker network `mediguard-network` with proper service dependencies
	- [ ] Configure volume persistence for all data stores
	- [ ] Create `.env.example` with all configuration variables documented

	### 1.2 Pydantic Settings Configuration

	Replace scattered `os.getenv()` calls with hierarchical Pydantic Settings:

	```python
	# New: src/config.py (course-inspired)
	class MedicalPDFSettings(BaseConfigSettings): # PDF parser config
	class ChunkingSettings(BaseConfigSettings): # Chunking parameters
	class OpenSearchSettings(BaseConfigSettings): # Search engine config
	class LangfuseSettings(BaseConfigSettings): # Observability config
	class RedisSettings(BaseConfigSettings): # Cache config
	class TelegramSettings(BaseConfigSettings): # Bot config
	class BiomarkerSettings(BaseConfigSettings): # Biomarker thresholds
	class Settings(BaseConfigSettings): # Root settings
	```

	Tasks:
	- [ ] Rewrite `src/config.py` — keep `ExplanationSOP` but add infrastructure settings classes
	- [ ] Use `env_nested_delimiter="__"` for hierarchical environment variables
	- [ ] Add `frozen=True` for immutable configuration
	- [ ] Move all hardcoded values to environment variables with sensible defaults
	- [ ] Create `get_settings()` factory with `@lru_cache`

	### 1.3 PostgreSQL Database Setup

	Add persistent storage for analysis history — critical for medical audit trail:

	```python
	# New models:
	class PatientAnalysis(Base): # Store each analysis run
	class AnalysisReport(Base): # Store final reports
	class MedicalDocument(Base): # Track ingested medical PDFs
	class BiomarkerReference(Base): # Biomarker reference ranges (currently JSON file)
	```

	Tasks:
	- [ ] Create `src/db/` package mirroring course pattern (factory, interfaces, postgresql)
	- [ ] Define SQLAlchemy models for analysis history and medical documents
	- [ ] Create repository pattern for data access
	- [ ] Set up Alembic for database migrations
	- [ ] Migrate `biomarker_references.json` to database (keep JSON as seed data)

	### 1.4 Project Structure Refactor

	Reorganize to match production patterns:

	```
	src/
	├── config.py # Pydantic Settings (hierarchical)
	├── main.py # FastAPI app with lifespan
	├── database.py # Database utilities
	├── dependencies.py # FastAPI dependency injection
	├── exceptions.py # Domain exception hierarchy
	├── middlewares.py # Request logging, timing
	├── db/ # Database layer
	│ ├── factory.py
	│ └── interfaces/
	├── models/ # SQLAlchemy models
	│ ├── analysis.py
	│ └── document.py
	├── repositories/ # Data access
	│ ├── analysis.py
	│ └── document.py
	├── routers/ # API endpoints
	│ ├── analyze.py # Biomarker analysis
	│ ├── ask.py # RAG Q&A (streaming + standard)
	│ ├── health.py # Comprehensive health checks
	│ └── search.py # Medical document search
	├── schemas/ # Pydantic request/response models
	│ ├── api/
	│ ├── medical/
	│ └── embeddings/
	├── services/ # Business logic
	│ ├── agents/ # Your 6 medical agents (KEEP!)
	│ │ ├── biomarker_analyzer.py
	│ │ ├── disease_explainer.py
	│ │ ├── biomarker_linker.py
	│ │ ├── clinical_guidelines.py
	│ │ ├── confidence_assessor.py
	│ │ ├── response_synthesizer.py
	│ │ ├── agentic_rag.py # NEW: LangGraph agentic wrapper
	│ │ ├── nodes/ # NEW: Guardrail, grading, rewriting
	│ │ ├── state.py # Enhanced state
	│ │ ├── context.py # Runtime dependency injection
	│ │ └── prompts.py # Medical-domain prompts
	│ ├── opensearch/ # NEW: Search engine client
	│ ├── embeddings/ # NEW: Production embeddings
	│ ├── cache/ # NEW: Redis caching
	│ ├── langfuse/ # NEW: Observability
	│ ├── ollama/ # NEW: Local LLM client
	│ ├── indexing/ # NEW: Chunking + indexing
	│ ├── pdf_parser/ # Enhanced: Use Docling
	│ ├── telegram/ # NEW: Bot integration
	│ └── biomarker/ # Extracted: validation + normalization
	├── evaluation/ # KEEP: 5D evaluation
	└── evolution/ # KEEP: SOP evolution
	```

	Tasks:
	- [ ] Create the new directory structure
	- [ ] Move API from `api/app/` into `src/` (single application)
	- [ ] Create `exceptions.py` with medical-domain exception hierarchy
	- [ ] Create `dependencies.py` with typed FastAPI dependency injection
	- [ ] Create `main.py` with proper lifespan context manager

	### 1.5 Development Tooling

	Tasks:
	- [ ] Create `pyproject.toml` replacing `requirements.txt` (use UV)
	- [ ] Create `Makefile` with start/stop/test/lint/format/health commands
	- [ ] Add `ruff` for linting and formatting
	- [ ] Add `mypy` for type checking
	- [ ] Add `.pre-commit-config.yaml`
	- [ ] Create `.env.example` and `.env.test`

	---

	## Phase 2: Medical Data Ingestion Pipeline (Week 2 Equivalent)

	> Goal: Automated ingestion of medical PDFs, clinical guidelines, and reference documents with Airflow orchestration.

	### 2.1 Medical PDF Parser Upgrade

	Replace basic PyPDF with Docling for better medical document handling:

	Tasks:
	- [ ] Create `src/services/pdf_parser/` with Docling integration (copy course pattern)
	- [ ] Add medical-specific section detection (Abstract, Methods, Results, Discussion, Clinical Guidelines)
	- [ ] Add table extraction for lab reference ranges
	- [ ] Add validation: file size limits, page limits, PDF header check
	- [ ] Add metadata extraction: title, authors, publication date, journal

	### 2.2 Medical Document Sources

	Unlike arXiv (single API), medical literature comes from multiple sources:

	Tasks:
	- [ ] Create `src/services/medical_sources/` package
	- [ ] Implement PubMed API client (free, rate-limited) for research papers
	- [ ] Implement local PDF upload endpoint for clinical guidelines
	- [ ] Implement reference document ingestion (WHO, CDC, ADA guidelines)
	- [ ] Create document deduplication logic (by title hash + content fingerprint)
	- [ ] Add `MedicalDocument` model tracking: source, parse status, indexing status

	### 2.3 Airflow Pipeline for Medical Literature

	Tasks:
	- [ ] Create `airflow/` directory with Dockerfile and entrypoint
	- [ ] Create `airflow/dags/medical_ingestion.py` DAG:
	- `setup_environment` → `fetch_new_documents` → `parse_pdfs` → `chunk_and_index` → `generate_report`
	- [ ] Schedule: Daily at 6 AM for PubMed updates, on-demand for uploaded PDFs
	- [ ] Add retry logic with exponential backoff
	- [ ] Mount `src/` into Airflow container for shared code

	### 2.4 PostgreSQL Storage for Documents

	Tasks:
	- [ ] Create `MedicalDocument` model: id, title, source, source_type, authors, abstract, raw_text, sections, parse_status, indexed_at
	- [ ] Create `PaperRepository` with CRUD + upsert + status tracking
	- [ ] Track processing pipeline: `uploaded → parsed → chunked → indexed`
	- [ ] Store parsed sections as JSON for re-indexing without re-parsing

	---

	## Phase 3: Production Search Foundation (Week 3 Equivalent)

	> Goal: Replace FAISS with OpenSearch for production BM25 keyword search with medical-specific optimizations.

	### 3.1 OpenSearch Client

	Tasks:
	- [ ] Create `src/services/opensearch/` package (adapt course pattern)
	- [ ] Implement `OpenSearchClient` with:
	- Health check, index management, BM25 search, bulk indexing
	- Medical-specific: Boost clinical term matches, support ICD-10 code filtering
	- [ ] Create `QueryBuilder` with medical field boosting:
	```
	fields: ["chunk_text^3", "title^2", "section_title^1.5", "abstract^1"]
	```
	- [ ] Create `index_config_hybrid.py` with medical document mapping:
	- Fields: chunk_text, title, authors, abstract, document_type (guideline/research/reference), condition_tags, publication_year

	### 3.2 Medical Document Index Mapping

	```python
	MEDICAL_CHUNKS_MAPPING = {
	"settings": {
	"index.knn": True,
	"analysis": {
	"analyzer": {
	"medical_analyzer": {
	"type": "custom",
	"tokenizer": "standard",
	"filter": ["lowercase", "medical_synonyms", "stop", "snowball"]
	}
	}
	}
	},
	"mappings": {
	"properties": {
	"chunk_text": {"type": "text", "analyzer": "medical_analyzer"},
	"document_type": {"type": "keyword"}, # guideline, research, reference
	"condition_tags": {"type": "keyword"}, # diabetes, anemia, etc.
	"biomarkers_mentioned": {"type": "keyword"}, # Glucose, HbA1c, etc.
	"embedding": {"type": "knn_vector", "dimension": 1024},
	# ... more fields
	}
	}
	}
	```

	Tasks:
	- [ ] Design medical-optimized OpenSearch mapping
	- [ ] Add medical synonym analyzer (e.g., "diabetes mellitus" ↔ "DM", "HbA1c" ↔ "glycated hemoglobin")
	- [ ] Create search endpoint `POST /api/v1/search` with filtering by document_type, condition_tags
	- [ ] Implement BM25 search with medical field boosting
	- [ ] Create index verification in startup lifespan

	---

	## Phase 4: Hybrid Search & Intelligent Chunking (Week 4 Equivalent)

	> Goal: Section-aware chunking for medical documents + hybrid search (BM25 + semantic) with RRF fusion.

	### 4.1 Medical-Aware Text Chunking

	Tasks:
	- [ ] Create `src/services/indexing/text_chunker.py` adapting course's `TextChunker`:
	- Section-aware chunking (detect: Introduction, Methods, Results, Discussion, Guidelines, References)
	- Target: 600 words per chunk, 100 word overlap
	- Medical metadata: section_title, biomarkers_mentioned, condition_tags
	- [ ] Create `MedicalTextChunker` subclass with:
	- Biomarker mention detection (scan for any of 24+ biomarker names)
	- Condition tag extraction (diabetes, anemia, heart disease, etc.)
	- Table-aware chunking (keep tables together)
	- Reference section filtering (skip bibliography chunks)
	- [ ] Create `HybridIndexingService` for chunk → embed → index pipeline

	### 4.2 Production Embeddings

	Tasks:
	- [ ] Create `src/services/embeddings/` with Jina AI client (1024d, passage/query differentiation)
	- [ ] Add fallback chain: Jina → Google → HuggingFace
	- [ ] Implement batch embedding for efficient indexing
	- [ ] Track embedding model in chunk metadata for versioning

	### 4.3 Hybrid Search with RRF

	Tasks:
	- [ ] Implement `search_unified()` supporting: BM25-only, vector-only, hybrid modes
	- [ ] Set up OpenSearch RRF (Reciprocal Rank Fusion) pipeline
	- [ ] Create unified search endpoint `POST /api/v1/hybrid-search/`
	- [ ] Add min_score filtering and result deduplication
	- [ ] Benchmark: BM25 vs. vector vs. hybrid on medical queries

	---

	## Phase 5: Complete RAG Pipeline with Streaming (Week 5 Equivalent)

	> Goal: Replace synchronous analysis with streaming RAG, add Gradio UI, optimize prompts.

	### 5.1 Ollama Client Upgrade

	Tasks:
	- [ ] Create `src/services/ollama/` package (adapt course pattern)
	- [ ] Implement `OllamaClient` with:
	- Health check, model listing, generate, streaming generate
	- Usage metadata extraction (tokens, latency)
	- LangChain integration: `get_langchain_model()` for structured output
	- [ ] Create medical-specific RAG prompt templates:
	- `rag_medical_system.txt` — optimized for medical explanation generation
	- Structured output format for clinical responses
	- [ ] Create `OllamaFactory` with `@lru_cache`

	### 5.2 Streaming RAG Endpoints

	Tasks:
	- [ ] Create `POST /api/v1/ask` — standard RAG with medical context retrieval
	- [ ] Create `POST /api/v1/stream` — SSE streaming for real-time responses
	- [ ] Create `POST /api/v1/analyze/stream` — streaming biomarker analysis
	- [ ] Integrate with existing multi-agent pipeline:
	```
	Query → Hybrid Search → Medical Chunks → Agent Pipeline → Streaming Response
	```

	### 5.3 Gradio Medical Interface

	Tasks:
	- [ ] Create `src/gradio_app.py` for interactive medical RAG:
	- Biomarker input form (structured entry)
	- Natural language input (free text)
	- Streaming response display
	- Search mode selector (BM25, hybrid, vector)
	- Model selector
	- Analysis history display
	- [ ] Create `gradio_launcher.py` for easy startup
	- [ ] Expose on port 7861

	### 5.4 Prompt Optimization

	Tasks:
	- [ ] Reduce prompt size by 60-80% (course achieved 80% reduction)
	- [ ] Create focused medical prompts (separate: biomarker analysis, disease explanation, guidelines)
	- [ ] Test prompt variants using 5D evaluation framework
	- [ ] Store best prompts as SOP parameters (tie into evolution engine)

	---

	## Phase 6: Monitoring, Caching & Observability (Week 6 Equivalent)

	> Goal: Add Langfuse tracing for the entire pipeline, Redis caching, and production monitoring.

	### 6.1 Langfuse Integration

	Tasks:
	- [ ] Create `src/services/langfuse/` package (adapt course pattern):
	- `client.py` — LangfuseTracer wrapper with v3 SDK
	- `factory.py` — cached tracer factory
	- `tracer.py` — medical-specific RAGTracer with named steps
	- [ ] Add spans for every pipeline step:
	- `biomarker_validation` → `query_embedding` → `search_retrieval` → `agent_execution` → `response_synthesis`
	- [ ] Track per-request metrics:
	- Total latency, LLM tokens used, search results count, cache hit/miss, agent execution time
	- [ ] Add Langfuse Docker services to docker-compose.yml
	- [ ] Create trace visualization for medical analysis pipeline

	### 6.2 Redis Caching

	Tasks:
	- [ ] Create `src/services/cache/` package (adapt course pattern):
	- Exact-match cache: SHA256(query + model + top_k + biomarkers) → cached response
	- TTL: 6 hours for general queries, 1 hour for biomarker analysis (values may change)
	- [ ] Add caching to:
	- `/api/v1/ask` — cache RAG responses
	- `/api/v1/analyze` — cache full analysis results
	- Embeddings — cache frequently queried embeddings
	- [ ] Add graceful fallback: cache miss → normal pipeline
	- [ ] Track cache hit rates in Langfuse

	### 6.3 Production Health Dashboard

	Tasks:
	- [ ] Enhance `/api/v1/health` to check all services:
	- PostgreSQL, OpenSearch, Redis, Ollama, Langfuse, Airflow
	- [ ] Add `/api/v1/metrics` endpoint for operational metrics
	- [ ] Create Langfuse dashboard for:
	- Average response time, cache hit rate, error rate, token costs
	- Per-agent execution times, search relevance scores

	---

	## Phase 7: Agentic RAG & Messaging Bot (Week 7 Equivalent)

	> Goal: Wrap your multi-agent pipeline in a LangGraph agentic workflow with guardrails, document grading, and query rewriting. Add Telegram bot for mobile access.

	### 7.1 Agentic RAG Wrapper

	This is the most impactful upgrade — it adds intelligence around your existing agents:

	```
	User Query
	↓
	[GUARDRAIL] ──── Is this a medical/biomarker question? ────→ [OUT OF SCOPE]
	↓ yes
	[RETRIEVE] ──── Hybrid search for medical documents ────→ [TOOL: search]
	↓
	[GRADE DOCUMENTS] ──── Are results relevant? ────→ [REWRITE QUERY] ──→ loop
	↓ yes
	[CLINICAL ANALYSIS] ──── Your 6 medical agents ────→ structured analysis
	↓
	[GENERATE RESPONSE] ──── Synthesize with citations ────→ final answer
	```

	Tasks:
	- [ ] Create `src/services/agents/agentic_rag.py` — `AgenticRAGService` class
	- [ ] Create `src/services/agents/nodes/`:
	- `guardrail_node.py` — Medical domain validation (score 0-100)
	- In-scope: biomarker questions, disease queries, clinical guidelines
	- Out-of-scope: non-medical, general knowledge, harmful content
	- `retrieve_node.py` — Creates tool call with `max_retrieval_attempts`
	- `grade_documents_node.py` — LLM evaluates medical relevance
	- `rewrite_query_node.py` — LLM rewrites for better medical retrieval
	- `generate_answer_node.py` — Uses your existing agent pipeline OR direct LLM
	- `out_of_scope_node.py` — Polite medical-domain rejection
	- [ ] Create `src/services/agents/state.py` — Enhanced state with guardrail_result, routing_decision, grading_results
	- [ ] Create `src/services/agents/context.py` — Runtime context for dependency injection
	- [ ] Create `src/services/agents/prompts.py` — Medical-specific prompts:
	- Guardrail: "Is this about health/biomarkers/medical conditions?"
	- Grading: "Does this medical document answer the clinical question?"
	- Rewriting: "Improve this medical query for better document retrieval"
	- Generation: "Synthesize medical findings with citations and safety caveats"
	- [ ] Create `src/services/agents/tools.py` — Medical retriever tool wrapping OpenSearch
	- [ ] Create `POST /api/v1/ask-agentic` endpoint
	- [ ] Add Langfuse tracing to every node

	### 7.2 Medical Guardrails (Critical for MedTech)

	Beyond the course's simple domain check, add medical-specific safety:

	Tasks:
	- [ ] Input guardrails:
	- Detect harmful queries (self-harm, drug abuse guidance)
	- Detect attempts to get diagnosis without proper data
	- Validate biomarker values are physiologically plausible
	- [ ] Output guardrails:
	- Always include "consult your healthcare provider" disclaimer
	- Never provide definitive diagnosis (always "suggests" / "may indicate")
	- Flag critical biomarker values with immediate action advice
	- Ensure safety_alerts are present for out-of-range values
	- [ ] Citation guardrails:
	- Ensure all medical claims have document citations
	- Flag unsupported claims

	### 7.3 Telegram Bot Integration

	Tasks:
	- [ ] Create `src/services/telegram/` package (adapt course pattern)
	- [ ] Implement bot commands:
	- `/start` — Welcome with medical assistant introduction
	- `/help` — Show capabilities and input format
	- `/analyze <biomarker values>` — Quick biomarker analysis
	- `/search <medical query>` — Search medical documents
	- `/report` — Get last analysis as formatted report
	- Free text — Full RAG Q&A about medical topics
	- [ ] Add typing indicators and progress messages
	- [ ] Integrate caching for repeated queries
	- [ ] Add rate limiting (medical queries shouldn't be spammed)
	- [ ] Create `TelegramFactory` gated by `TELEGRAM__ENABLED=true`

	### 7.4 Feedback Loop

	Tasks:
	- [ ] Create `POST /api/v1/feedback` endpoint (adapt from course)
	- [ ] Integrate with Langfuse scoring
	- [ ] Use feedback data to identify weak prompts → feed into SOP evolution engine

	---

	## Phase 8: MedTech-Specific Additions (Beyond Course)

	> Goal: Things the course doesn't cover but your medical domain demands.

	### 8.1 HIPAA-Awareness Patterns

	Tasks:
	- [ ] Never log patient biomarker values in plain text
	- [ ] Add request ID tracking without PII
	- [ ] Create data retention policy (auto-delete analysis data after configurable period)
	- [ ] Add audit logging for all analysis requests
	- [ ] Document HIPAA compliance approach (even if not yet certified)

	### 8.2 Medical Safety Testing

	Tasks:
	- [ ] Create medical-specific test suite:
	- Critical value detection tests (every critical biomarker)
	- Guardrail rejection tests (non-medical queries)
	- Citation completeness tests
	- Safety disclaimer presence tests
	- Biomarker normalization tests (already have some)
	- [ ] Integrate 5D evaluation into CI pipeline
	- [ ] Create test fixtures with realistic medical scenarios

	### 8.3 Evolution Engine Integration

	Tasks:
	- [ ] Wire SOP evolution engine to production metrics (Langfuse data)
	- [ ] Create Airflow DAG for scheduled evolution cycles
	- [ ] Store evolved SOPs in PostgreSQL with version tracking
	- [ ] A/B test SOP variants using Langfuse trace comparison

	### 8.4 Multi-condition Support

	Tasks:
	- [ ] Extend condition coverage beyond current 5 diseases
	- [ ] Add condition-specific retrieval strategies
	- [ ] Create condition-specific chunking filters
	- [ ] Support multi-condition analysis (comorbidities)

	---

	## Implementation Priority Matrix

	\| Priority \| Phase \| Effort \| Impact \| Dependencies \|
	\|----------\|-------\|--------\|--------\|--------------\|
	\| 🔴 P0 \| 1.1 Docker Compose \| 2 days \| Critical \| None \|
	\| 🔴 P0 \| 1.2 Pydantic Settings \| 1 day \| Critical \| None \|
	\| 🔴 P0 \| 1.4 Project Restructure \| 2 days \| Critical \| None \|
	\| 🔴 P0 \| 1.5 Dev Tooling \| 0.5 day \| Critical \| 1.4 \|
	\| 🔴 P0 \| 1.3 PostgreSQL + Models \| 2 days \| Critical \| 1.1, 1.4 \|
	\| 🟡 P1 \| 3.1 OpenSearch Client \| 2 days \| High \| 1.1, 1.4 \|
	\| 🟡 P1 \| 3.2 Medical Index Mapping \| 1 day \| High \| 3.1 \|
	\| 🟡 P1 \| 4.1 Medical Text Chunker \| 2 days \| High \| 3.1 \|
	\| 🟡 P1 \| 4.2 Production Embeddings \| 1 day \| High \| 4.1 \|
	\| 🟡 P1 \| 4.3 Hybrid Search + RRF \| 1 day \| High \| 3.1, 4.2 \|
	\| 🟡 P1 \| 5.1 Ollama Client \| 1 day \| High \| 1.4 \|
	\| 🟡 P1 \| 5.2 Streaming Endpoints \| 1 day \| High \| 5.1, 4.3 \|
	\| 🟡 P1 \| 2.1 PDF Parser (Docling) \| 1 day \| High \| 1.4 \|
	\| 🟡 P1 \| 7.1 Agentic RAG Wrapper \| 3 days \| High \| 5.2, 4.3 \|
	\| 🟡 P1 \| 7.2 Medical Guardrails \| 2 days \| High \| 7.1 \|
	\| 🟢 P2 \| 2.3 Airflow Pipeline \| 2 days \| Medium \| 1.1, 2.1, 4.1 \|
	\| 🟢 P2 \| 5.3 Gradio Interface \| 1 day \| Medium \| 5.2 \|
	\| 🟢 P2 \| 6.1 Langfuse Tracing \| 2 days \| Medium \| 1.1, 5.2 \|
	\| 🟢 P2 \| 6.2 Redis Caching \| 1 day \| Medium \| 1.1, 5.2 \|
	\| 🟢 P2 \| 6.3 Health Dashboard \| 0.5 day \| Medium \| 6.1 \|
	\| 🟢 P2 \| 7.3 Telegram Bot \| 2 days \| Medium \| 7.1, 6.2 \|
	\| 🟢 P2 \| 7.4 Feedback Loop \| 0.5 day \| Medium \| 6.1 \|
	\| 🔵 P3 \| 2.2 Medical Sources \| 2 days \| Low \| 2.1 \|
	\| 🔵 P3 \| 8.1 HIPAA Patterns \| 1 day \| Low \| 1.3 \|
	\| 🔵 P3 \| 8.2 Safety Testing \| 2 days \| Low \| 7.2 \|
	\| 🔵 P3 \| 8.3 Evolution Integration \| 2 days \| Low \| 6.1, 2.3 \|
	\| 🔵 P3 \| 8.4 Multi-condition \| 3 days \| Low \| 4.1 \|

	Estimated Total: ~40 days of focused work

	---

	## Migration Strategy

	### Step 1: Foundation (Week 1-2 of work)
	1. Restructure project layout → Phase 1.4
	2. Create Pydantic Settings → Phase 1.2
	3. Set up Docker Compose → Phase 1.1
	4. Add PostgreSQL with models → Phase 1.3
	5. Add dev tooling → Phase 1.5

	### Step 2: Search Engine (Week 2-3)
	6. Create OpenSearch client + medical mapping → Phase 3.1, 3.2
	7. Build medical text chunker → Phase 4.1
	8. Add production embeddings (Jina) → Phase 4.2
	9. Implement hybrid search + RRF → Phase 4.3
	10. Upgrade PDF parser to Docling → Phase 2.1

	### Step 3: RAG Pipeline (Week 3-4)
	11. Create Ollama client → Phase 5.1
	12. Add streaming endpoints → Phase 5.2
	13. Build agentic RAG wrapper → Phase 7.1
	14. Add medical guardrails → Phase 7.2
	15. Create Gradio interface → Phase 5.3

	### Step 4: Production Hardening (Week 4-5)
	16. Add Langfuse observability → Phase 6.1
	17. Add Redis caching → Phase 6.2
	18. Set up Airflow pipeline → Phase 2.3
	19. Build Telegram bot → Phase 7.3
	20. Add feedback loop → Phase 7.4

	### Step 5: Polish (Week 5-6)
	21. Health dashboard → Phase 6.3
	22. Medical safety testing → Phase 8.2
	23. HIPAA patterns → Phase 8.1
	24. Evolution engine integration → Phase 8.3

	### Key Migration Rules
	- Never break what works: Keep all existing agents functional throughout
	- Test at every step: Run existing tests after each phase
	- Incremental Docker: Start with API + PostgreSQL, add services one at a time
	- Feature flags: Gate new features (Telegram, Langfuse, Redis) behind settings
	- Backward compatibility: Keep CLI chatbot working alongside new API

	---

	## Architecture Target State

	```
	┌─────────────────────────────────────────────────────────────────────────┐
	│ Docker Compose Orchestration │
	│ │
	│ ┌──────────┐ ┌───────────┐ ┌───────────┐ ┌────────┐ ┌─────────┐ │
	│ │ FastAPI │ │PostgreSQL │ │ OpenSearch │ │ Ollama │ │ Airflow │ │
	│ │ + Gradio │ │ (reports, │ │ (hybrid │ │ (local │ │ (daily │ │
	│ │ (8000, │ │ docs, │ │ medical │ │ LLM) │ │ ingest) │ │
	│ │ 7861) │ │ history) │ │ search) │ │ │ │ │ │
	│ └────┬─────┘ └─────┬─────┘ └─────┬─────┘ └───┬────┘ └────┬────┘ │
	│ │ │ │ │ │ │
	│ ┌────┴─────┐ ┌─────┴─────┐ ┌────┴────────────┴────────────┴──┐ │
	│ │ Redis │ │ Langfuse │ │ mediguard-network │ │
	│ │ (cache) │ │ (observe) │ └──────────────────────────────────┘ │
	│ └──────────┘ └───────────┘ │
	│ │
	│ ┌──────────────────────────────────────────────────────────────────┐ │
	│ │ Agentic RAG Pipeline │ │
	│ │ │ │
	│ │ Query → [Guardrail] → [Retrieve] → [Grade] → [6 Medical Agents] │ │
	│ │ ↓ ↑ ↓ ↓ │ │
	│ │ [Out of Scope] [Rewrite] [Generate] → Final Response │ │
	│ │ │ │
	│ │ Agents: Biomarker Analyzer │ Disease Explainer │ Linker │ │
	│ │ Clinical Guidelines │ Confidence │ Synthesizer │ │
	│ └──────────────────────────────────────────────────────────────────┘ │
	│ │
	│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────────┐ │
	│ │ Telegram Bot │ │ Gradio UI │ │ 5D Eval + SOP Evolution │ │
	│ │ (mobile) │ │ (desktop) │ │ (self-improvement loop) │ │
	│ └──────────────┘ └──────────────┘ └──────────────────────────────┘ │
	└─────────────────────────────────────────────────────────────────────────┘
	```

	---

	## Files to Create (Summary)

	\| New File \| Source of Inspiration \|
	\|----------\|----------------------\|
	\| `docker-compose.yml` \| Course `compose.yml` (adapted) \|
	\| `Dockerfile` \| Course `Dockerfile` (multi-stage UV) \|
	\| `Makefile` \| Course `Makefile` \|
	\| `pyproject.toml` \| Course `pyproject.toml` \|
	\| `.pre-commit-config.yaml` \| Course `.pre-commit-config.yaml` \|
	\| `.env.example` \| Course `.env.example` \|
	\| `src/main.py` \| Course `src/main.py` (lifespan pattern) \|
	\| `src/config.py` \| Course `src/config.py` + existing SOP config \|
	\| `src/dependencies.py` \| Course `src/dependencies.py` \|
	\| `src/exceptions.py` \| Course `src/exceptions.py` (medical exceptions) \|
	\| `src/database.py` \| Course `src/database.py` \|
	\| `src/db/` \| Course `src/db/` \|
	\| `src/models/analysis.py` \| New (medical domain) \|
	\| `src/models/document.py` \| Course `src/models/paper.py` (adapted) \|
	\| `src/repositories/` \| Course `src/repositories/` (adapted) \|
	\| `src/routers/ask.py` \| Course `src/routers/ask.py` \|
	\| `src/routers/search.py` \| Course `src/routers/hybrid_search.py` \|
	\| `src/routers/health.py` \| Course `src/routers/ping.py` (enhanced) \|
	\| `src/schemas/` \| Course `src/schemas/` (medical schemas) \|
	\| `src/services/opensearch/` \| Course `src/services/opensearch/` \|
	\| `src/services/embeddings/` \| Course `src/services/embeddings/` \|
	\| `src/services/ollama/` \| Course `src/services/ollama/` \|
	\| `src/services/cache/` \| Course `src/services/cache/` \|
	\| `src/services/langfuse/` \| Course `src/services/langfuse/` \|
	\| `src/services/indexing/` \| Course `src/services/indexing/` (medical chunks) \|
	\| `src/services/pdf_parser/` \| Course `src/services/pdf_parser/` \|
	\| `src/services/telegram/` \| Course `src/services/telegram/` \|
	\| `src/services/agents/agentic_rag.py` \| Course (adapted for medical agents) \|
	\| `src/services/agents/nodes/*` \| Course (medical guardrails) \|
	\| `src/services/agents/context.py` \| Course \|
	\| `src/services/agents/prompts.py` \| Course (medical prompts) \|
	\| `src/gradio_app.py` \| Course `src/gradio_app.py` (medical UI) \|
	\| `airflow/dags/medical_ingestion.py` \| Course `airflow/dags/arxiv_paper_ingestion.py` \|

	## Files to Keep & Enhance

	\| Existing File \| Action \|
	\|---------------\|--------\|
	\| `src/agents/biomarker_analyzer.py` \| Keep, move to `src/services/agents/medical/` \|
	\| `src/agents/disease_explainer.py` \| Keep, move, add OpenSearch retriever \|
	\| `src/agents/biomarker_linker.py` \| Keep, move, add OpenSearch retriever \|
	\| `src/agents/clinical_guidelines.py` \| Keep, move, add OpenSearch retriever \|
	\| `src/agents/confidence_assessor.py` \| Keep, move \|
	\| `src/agents/response_synthesizer.py` \| Keep, move \|
	\| `src/biomarker_validator.py` \| Keep, move to `src/services/biomarker/` \|
	\| `src/biomarker_normalization.py` \| Keep, move to `src/services/biomarker/` \|
	\| `src/evaluation/` \| Keep, enhance with Langfuse integration \|
	\| `src/evolution/` \| Keep, wire to production metrics \|
	\| `config/biomarker_references.json` \| Keep as seed data, migrate to DB \|
	\| `scripts/chat.py` \| Keep, update imports \|
	\| `tests/*` \| Keep, add production test fixtures \|

	---

	This plan transforms MediGuard AI from a working prototype into a production-grade medical RAG system, applying every infrastructure lesson from the arXiv Paper Curator course while preserving and enhancing your unique medical domain logic.