Spaces:
Sleeping
MediGuard AI β Production Upgrade Plan
From Prototype to Production-Grade MedTech RAG System
Generated: 2026-02-23
Based on: Deep review of production-agentic-rag-course (Weeks 1β7) + existing RagBot codebase
Goal: Take the existing MediGuard AI (clinical biomarker analysis + RAG explanation system) to full production quality, applying every lesson from the arXiv Paper Curator course β adapted for the MedTech domain.
Table of Contents
- Executive Summary
- Deep Review: Course vs. Your Codebase
- Architecture Gap Analysis
- Phase 1: Infrastructure Foundation
- Phase 2: Medical Data Ingestion Pipeline
- Phase 3: Production Search Foundation
- Phase 4: Hybrid Search & Intelligent Chunking
- Phase 5: Complete RAG Pipeline with Streaming
- Phase 6: Monitoring, Caching & Observability
- Phase 7: Agentic RAG & Messaging Bot
- Phase 8: MedTech-Specific Additions
- Implementation Priority Matrix
- Migration Strategy
1. Executive Summary
Your RagBot is a working prototype with strong domain logic (biomarker validation, multi-agent clinical analysis, 5D evaluation, SOP evolution). The course teaches production infrastructure (Docker orchestration, OpenSearch hybrid search, Airflow pipelines, Redis caching, Langfuse observability, LangGraph agentic workflows, Telegram bot).
The strategy: Keep your excellent medical domain logic and multi-agent architecture, but rebuild the infrastructure layer to match production standards. Your domain is harder than arXiv papers β medical data demands stricter validation, HIPAA-aware patterns, and safety guardrails.
What You Have (Strengths)
- β 6 specialized medical agents (Biomarker Analyzer, Disease Explainer, Biomarker-Disease Linker, Clinical Guidelines, Confidence Assessor, Response Synthesizer)
- β LangGraph orchestration with parallel execution
- β Robust biomarker validation with 24 biomarkers, reference ranges, critical values
- β 5D evaluation framework (Clinical Accuracy, Evidence Grounding, Actionability, Clarity, Safety)
- β SOP evolution engine (Outer Loop optimization)
- β Multi-provider LLM support (Groq, Gemini, Ollama)
- β Basic FastAPI with analysis endpoints
- β CLI chatbot with natural language biomarker extraction
What You're Missing (Gaps)
- β No Docker Compose orchestration (only minimal single-service Dockerfile)
- β No production database (PostgreSQL) β no patient/report persistence
- β No production search engine β using FAISS (in-memory, single-file, no filtering)
- β No chunking strategy β basic RecursiveCharacterTextSplitter only
- β No hybrid search (BM25 + vector) β vector-only retrieval
- β No production embeddings β using local HuggingFace MiniLM (384d) or Google free tier
- β No data ingestion pipeline (Airflow) β manual PDF loading
- β No caching layer (Redis) β every query hits LLM
- β No observability (Langfuse) β no tracing, no cost tracking
- β No streaming responses β synchronous only
- β No Gradio interface β CLI only (besides basic API)
- β No messaging bot (Telegram/WhatsApp) β no mobile access
- β No agentic RAG with guardrails, document grading, query rewriting
- β No proper dependency injection pattern (FastAPI
Depends()) - β No Pydantic Settings with env-nested config
- β No factory pattern for service initialization
- β No proper exception hierarchy
- β No health checks for all services
- β No Makefile / dev tooling (ruff, mypy, pre-commit)
- β No proper test infrastructure (pytest fixtures, test containers)
2. Deep Review: Course vs. Your Codebase
Course Architecture (What Production Looks Like)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Docker Compose Orchestration β
ββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ¬ββββββββββββββββββ€
β FastAPI βPostgreSQLβOpenSearchβ Ollama β Airflow β
β (8000) β (5432) β (9200) β (11434) β (8080) β
ββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββΌββββββββββββββββββ€
β Redis β Langfuse βClickHouseβ MinIO β Langfuse-PG β
β (6379) β (3001) β β β (5433) β
ββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββ΄ββββββββββββββββββ€
β Gradio UI (7861) β Telegram Bot β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Patterns from Course:
- Pydantic Settings with
env_nested_delimiter="__"for hierarchical config - Factory pattern (
make_*functions) for every service - Dependency injection via FastAPI
Depends()with typed annotations - Lifespan context for startup/shutdown with proper resource management
- Service layer separation:
routers/βservices/βclients/ - Schema-driven: Separate Pydantic schemas for API, database, embeddings, indexing
- Exception hierarchy: Domain-specific exceptions (
PDFParsingException,OllamaException, etc.) - Context dataclass for LangGraph runtime dependency injection
- Structured LLM output via
.with_structured_output(PydanticModel)
Your Codebase Architecture (Current State)
βββββββββββββββββββββββββββββββββββββββββββββββ
β Basic FastAPI (api/app/) β
β Single Dockerfile, no orchestration β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β src/ (Core Domain Logic) β
β βββββββββββββββββββββββββββββββββββββββ β
β β workflow.py (LangGraph StateGraph) β β
β β 6 agents/ (parallel execution) β β
β β biomarker_validator.py (24 markers) β β
β β pdf_processor.py (FAISS + PyPDF) β β
β β evaluation/ (5D framework) β β
β β evolution/ (SOP optimization) β β
β βββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β FAISS vector store (single file) β
β No PostgreSQL, No Redis, No OpenSearch β
βββββββββββββββββββββββββββββββββββββββββββββββ
3. Architecture Gap Analysis
| Dimension | Course (Production) | Your Codebase (Prototype) | Gap Severity |
|---|---|---|---|
| Container Orchestration | Docker Compose with 12+ services, health checks, networks | Single Dockerfile, manual startup | π΄ Critical |
| Database | PostgreSQL 16 with SQLAlchemy models, repositories | None (in-memory only) | π΄ Critical |
| Search Engine | OpenSearch 2.19 with BM25 + KNN hybrid, RRF fusion | FAISS (vector-only, no filtering) | π΄ Critical |
| Chunking | Section-aware chunking (600w, 100w overlap, metadata) | Basic RecursiveCharacterTextSplitter (1000 char) | π‘ Major |
| Embeddings | Jina AI v3 (1024d, passage/query differentiation) | HuggingFace MiniLM (384d) or Google free tier | π‘ Major |
| Data Pipeline | Airflow DAGs (daily schedule, fetchβparseβchunkβindex) | Manual PDF loading, one-time setup | π‘ Major |
| Caching | Redis with TTL, exact-match, SHA256 keys | None | π‘ Major |
| Observability | Langfuse v3 (traces, spans, generations, cost tracking) | None (print statements only) | π‘ Major |
| Streaming | SSE streaming with Gradio UI | None (synchronous responses) | π‘ Major |
| Agentic RAG | LangGraph with guardrails, grading, rewriting, context_schema | Basic LangGraph (no guardrails, no grading) | π‘ Major |
| Bot Integration | Telegram bot with /search, Q&A, caching | None | π’ Enhancement |
| Config Management | Pydantic Settings, hierarchical env vars, frozen models | Basic os.getenv, dotenv | π‘ Major |
| Dependency Injection | FastAPI Depends() with typed annotations | Manual global singletons | π‘ Major |
| Error Handling | Domain exception hierarchy, graceful fallbacks | Basic try/except with prints | π‘ Major |
| Code Quality | Ruff, MyPy, pre-commit, pytest with fixtures | Minimal pytest, no linting | π’ Enhancement |
| API Design | Versioned (/api/v1/), health checks for all services | Basic routes, minimal health check | π‘ Major |
Phase 1: Infrastructure Foundation (Week 1 Equivalent)
Goal: Containerize everything, add PostgreSQL for persistence, set up OpenSearch, establish professional development environment.
1.1 Docker Compose Orchestration
Create a production docker-compose.yml with all services:
# Target services for MediGuard AI:
services:
api: # FastAPI application (port 8000)
postgres: # Patient reports, analysis history (port 5432)
opensearch: # Medical document search engine (port 9200)
opensearch-dashboards: # Search UI (port 5601)
redis: # Response caching (port 6379)
ollama: # Local LLM for privacy-sensitive medical data (port 11434)
airflow: # Medical literature pipeline (port 8080)
langfuse-web: # Observability dashboard (port 3001)
langfuse-worker/postgres/redis/clickhouse/minio: # Langfuse infra
Tasks:
- Create root
docker-compose.ymladapting course pattern to MedTech services - Create multi-stage
Dockerfileusing UV package manager (copy course pattern) - Add health checks for every service (PostgreSQL, OpenSearch, Redis, Ollama)
- Set up Docker network
mediguard-networkwith proper service dependencies - Configure volume persistence for all data stores
- Create
.env.examplewith all configuration variables documented
1.2 Pydantic Settings Configuration
Replace scattered os.getenv() calls with hierarchical Pydantic Settings:
# New: src/config.py (course-inspired)
class MedicalPDFSettings(BaseConfigSettings): # PDF parser config
class ChunkingSettings(BaseConfigSettings): # Chunking parameters
class OpenSearchSettings(BaseConfigSettings): # Search engine config
class LangfuseSettings(BaseConfigSettings): # Observability config
class RedisSettings(BaseConfigSettings): # Cache config
class TelegramSettings(BaseConfigSettings): # Bot config
class BiomarkerSettings(BaseConfigSettings): # Biomarker thresholds
class Settings(BaseConfigSettings): # Root settings
Tasks:
- Rewrite
src/config.pyβ keepExplanationSOPbut add infrastructure settings classes - Use
env_nested_delimiter="__"for hierarchical environment variables - Add
frozen=Truefor immutable configuration - Move all hardcoded values to environment variables with sensible defaults
- Create
get_settings()factory with@lru_cache
1.3 PostgreSQL Database Setup
Add persistent storage for analysis history β critical for medical audit trail:
# New models:
class PatientAnalysis(Base): # Store each analysis run
class AnalysisReport(Base): # Store final reports
class MedicalDocument(Base): # Track ingested medical PDFs
class BiomarkerReference(Base): # Biomarker reference ranges (currently JSON file)
Tasks:
- Create
src/db/package mirroring course pattern (factory, interfaces, postgresql) - Define SQLAlchemy models for analysis history and medical documents
- Create repository pattern for data access
- Set up Alembic for database migrations
- Migrate
biomarker_references.jsonto database (keep JSON as seed data)
1.4 Project Structure Refactor
Reorganize to match production patterns:
src/
βββ config.py # Pydantic Settings (hierarchical)
βββ main.py # FastAPI app with lifespan
βββ database.py # Database utilities
βββ dependencies.py # FastAPI dependency injection
βββ exceptions.py # Domain exception hierarchy
βββ middlewares.py # Request logging, timing
βββ db/ # Database layer
β βββ factory.py
β βββ interfaces/
βββ models/ # SQLAlchemy models
β βββ analysis.py
β βββ document.py
βββ repositories/ # Data access
β βββ analysis.py
β βββ document.py
βββ routers/ # API endpoints
β βββ analyze.py # Biomarker analysis
β βββ ask.py # RAG Q&A (streaming + standard)
β βββ health.py # Comprehensive health checks
β βββ search.py # Medical document search
βββ schemas/ # Pydantic request/response models
β βββ api/
β βββ medical/
β βββ embeddings/
βββ services/ # Business logic
β βββ agents/ # Your 6 medical agents (KEEP!)
β β βββ biomarker_analyzer.py
β β βββ disease_explainer.py
β β βββ biomarker_linker.py
β β βββ clinical_guidelines.py
β β βββ confidence_assessor.py
β β βββ response_synthesizer.py
β β βββ agentic_rag.py # NEW: LangGraph agentic wrapper
β β βββ nodes/ # NEW: Guardrail, grading, rewriting
β β βββ state.py # Enhanced state
β β βββ context.py # Runtime dependency injection
β β βββ prompts.py # Medical-domain prompts
β βββ opensearch/ # NEW: Search engine client
β βββ embeddings/ # NEW: Production embeddings
β βββ cache/ # NEW: Redis caching
β βββ langfuse/ # NEW: Observability
β βββ ollama/ # NEW: Local LLM client
β βββ indexing/ # NEW: Chunking + indexing
β βββ pdf_parser/ # Enhanced: Use Docling
β βββ telegram/ # NEW: Bot integration
β βββ biomarker/ # Extracted: validation + normalization
βββ evaluation/ # KEEP: 5D evaluation
βββ evolution/ # KEEP: SOP evolution
Tasks:
- Create the new directory structure
- Move API from
api/app/intosrc/(single application) - Create
exceptions.pywith medical-domain exception hierarchy - Create
dependencies.pywith typed FastAPI dependency injection - Create
main.pywith proper lifespan context manager
1.5 Development Tooling
Tasks:
- Create
pyproject.tomlreplacingrequirements.txt(use UV) - Create
Makefilewith start/stop/test/lint/format/health commands - Add
rufffor linting and formatting - Add
mypyfor type checking - Add
.pre-commit-config.yaml - Create
.env.exampleand.env.test
Phase 2: Medical Data Ingestion Pipeline (Week 2 Equivalent)
Goal: Automated ingestion of medical PDFs, clinical guidelines, and reference documents with Airflow orchestration.
2.1 Medical PDF Parser Upgrade
Replace basic PyPDF with Docling for better medical document handling:
Tasks:
- Create
src/services/pdf_parser/with Docling integration (copy course pattern) - Add medical-specific section detection (Abstract, Methods, Results, Discussion, Clinical Guidelines)
- Add table extraction for lab reference ranges
- Add validation: file size limits, page limits, PDF header check
- Add metadata extraction: title, authors, publication date, journal
2.2 Medical Document Sources
Unlike arXiv (single API), medical literature comes from multiple sources:
Tasks:
- Create
src/services/medical_sources/package - Implement PubMed API client (free, rate-limited) for research papers
- Implement local PDF upload endpoint for clinical guidelines
- Implement reference document ingestion (WHO, CDC, ADA guidelines)
- Create document deduplication logic (by title hash + content fingerprint)
- Add
MedicalDocumentmodel tracking: source, parse status, indexing status
2.3 Airflow Pipeline for Medical Literature
Tasks:
- Create
airflow/directory with Dockerfile and entrypoint - Create
airflow/dags/medical_ingestion.pyDAG:setup_environmentβfetch_new_documentsβparse_pdfsβchunk_and_indexβgenerate_report
- Schedule: Daily at 6 AM for PubMed updates, on-demand for uploaded PDFs
- Add retry logic with exponential backoff
- Mount
src/into Airflow container for shared code
2.4 PostgreSQL Storage for Documents
Tasks:
- Create
MedicalDocumentmodel: id, title, source, source_type, authors, abstract, raw_text, sections, parse_status, indexed_at - Create
PaperRepositorywith CRUD + upsert + status tracking - Track processing pipeline:
uploaded β parsed β chunked β indexed - Store parsed sections as JSON for re-indexing without re-parsing
Phase 3: Production Search Foundation (Week 3 Equivalent)
Goal: Replace FAISS with OpenSearch for production BM25 keyword search with medical-specific optimizations.
3.1 OpenSearch Client
Tasks:
- Create
src/services/opensearch/package (adapt course pattern) - Implement
OpenSearchClientwith:- Health check, index management, BM25 search, bulk indexing
- Medical-specific: Boost clinical term matches, support ICD-10 code filtering
- Create
QueryBuilderwith medical field boosting:fields: ["chunk_text^3", "title^2", "section_title^1.5", "abstract^1"] - Create
index_config_hybrid.pywith medical document mapping:- Fields: chunk_text, title, authors, abstract, document_type (guideline/research/reference), condition_tags, publication_year
3.2 Medical Document Index Mapping
MEDICAL_CHUNKS_MAPPING = {
"settings": {
"index.knn": True,
"analysis": {
"analyzer": {
"medical_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "medical_synonyms", "stop", "snowball"]
}
}
}
},
"mappings": {
"properties": {
"chunk_text": {"type": "text", "analyzer": "medical_analyzer"},
"document_type": {"type": "keyword"}, # guideline, research, reference
"condition_tags": {"type": "keyword"}, # diabetes, anemia, etc.
"biomarkers_mentioned": {"type": "keyword"}, # Glucose, HbA1c, etc.
"embedding": {"type": "knn_vector", "dimension": 1024},
# ... more fields
}
}
}
Tasks:
- Design medical-optimized OpenSearch mapping
- Add medical synonym analyzer (e.g., "diabetes mellitus" β "DM", "HbA1c" β "glycated hemoglobin")
- Create search endpoint
POST /api/v1/searchwith filtering by document_type, condition_tags - Implement BM25 search with medical field boosting
- Create index verification in startup lifespan
Phase 4: Hybrid Search & Intelligent Chunking (Week 4 Equivalent)
Goal: Section-aware chunking for medical documents + hybrid search (BM25 + semantic) with RRF fusion.
4.1 Medical-Aware Text Chunking
Tasks:
- Create
src/services/indexing/text_chunker.pyadapting course'sTextChunker:- Section-aware chunking (detect: Introduction, Methods, Results, Discussion, Guidelines, References)
- Target: 600 words per chunk, 100 word overlap
- Medical metadata: section_title, biomarkers_mentioned, condition_tags
- Create
MedicalTextChunkersubclass with:- Biomarker mention detection (scan for any of 24+ biomarker names)
- Condition tag extraction (diabetes, anemia, heart disease, etc.)
- Table-aware chunking (keep tables together)
- Reference section filtering (skip bibliography chunks)
- Create
HybridIndexingServicefor chunk β embed β index pipeline
4.2 Production Embeddings
Tasks:
- Create
src/services/embeddings/with Jina AI client (1024d, passage/query differentiation) - Add fallback chain: Jina β Google β HuggingFace
- Implement batch embedding for efficient indexing
- Track embedding model in chunk metadata for versioning
4.3 Hybrid Search with RRF
Tasks:
- Implement
search_unified()supporting: BM25-only, vector-only, hybrid modes - Set up OpenSearch RRF (Reciprocal Rank Fusion) pipeline
- Create unified search endpoint
POST /api/v1/hybrid-search/ - Add min_score filtering and result deduplication
- Benchmark: BM25 vs. vector vs. hybrid on medical queries
Phase 5: Complete RAG Pipeline with Streaming (Week 5 Equivalent)
Goal: Replace synchronous analysis with streaming RAG, add Gradio UI, optimize prompts.
5.1 Ollama Client Upgrade
Tasks:
- Create
src/services/ollama/package (adapt course pattern) - Implement
OllamaClientwith:- Health check, model listing, generate, streaming generate
- Usage metadata extraction (tokens, latency)
- LangChain integration:
get_langchain_model()for structured output
- Create medical-specific RAG prompt templates:
rag_medical_system.txtβ optimized for medical explanation generation- Structured output format for clinical responses
- Create
OllamaFactorywith@lru_cache
5.2 Streaming RAG Endpoints
Tasks:
- Create
POST /api/v1/askβ standard RAG with medical context retrieval - Create
POST /api/v1/streamβ SSE streaming for real-time responses - Create
POST /api/v1/analyze/streamβ streaming biomarker analysis - Integrate with existing multi-agent pipeline:
Query β Hybrid Search β Medical Chunks β Agent Pipeline β Streaming Response
5.3 Gradio Medical Interface
Tasks:
- Create
src/gradio_app.pyfor interactive medical RAG:- Biomarker input form (structured entry)
- Natural language input (free text)
- Streaming response display
- Search mode selector (BM25, hybrid, vector)
- Model selector
- Analysis history display
- Create
gradio_launcher.pyfor easy startup - Expose on port 7861
5.4 Prompt Optimization
Tasks:
- Reduce prompt size by 60-80% (course achieved 80% reduction)
- Create focused medical prompts (separate: biomarker analysis, disease explanation, guidelines)
- Test prompt variants using 5D evaluation framework
- Store best prompts as SOP parameters (tie into evolution engine)
Phase 6: Monitoring, Caching & Observability (Week 6 Equivalent)
Goal: Add Langfuse tracing for the entire pipeline, Redis caching, and production monitoring.
6.1 Langfuse Integration
Tasks:
- Create
src/services/langfuse/package (adapt course pattern):client.pyβ LangfuseTracer wrapper with v3 SDKfactory.pyβ cached tracer factorytracer.pyβ medical-specific RAGTracer with named steps
- Add spans for every pipeline step:
biomarker_validationβquery_embeddingβsearch_retrievalβagent_executionβresponse_synthesis
- Track per-request metrics:
- Total latency, LLM tokens used, search results count, cache hit/miss, agent execution time
- Add Langfuse Docker services to docker-compose.yml
- Create trace visualization for medical analysis pipeline
6.2 Redis Caching
Tasks:
- Create
src/services/cache/package (adapt course pattern):- Exact-match cache: SHA256(query + model + top_k + biomarkers) β cached response
- TTL: 6 hours for general queries, 1 hour for biomarker analysis (values may change)
- Add caching to:
/api/v1/askβ cache RAG responses/api/v1/analyzeβ cache full analysis results- Embeddings β cache frequently queried embeddings
- Add graceful fallback: cache miss β normal pipeline
- Track cache hit rates in Langfuse
6.3 Production Health Dashboard
Tasks:
- Enhance
/api/v1/healthto check all services:- PostgreSQL, OpenSearch, Redis, Ollama, Langfuse, Airflow
- Add
/api/v1/metricsendpoint for operational metrics - Create Langfuse dashboard for:
- Average response time, cache hit rate, error rate, token costs
- Per-agent execution times, search relevance scores
Phase 7: Agentic RAG & Messaging Bot (Week 7 Equivalent)
Goal: Wrap your multi-agent pipeline in a LangGraph agentic workflow with guardrails, document grading, and query rewriting. Add Telegram bot for mobile access.
7.1 Agentic RAG Wrapper
This is the most impactful upgrade β it adds intelligence around your existing agents:
User Query
β
[GUARDRAIL] ββββ Is this a medical/biomarker question? βββββ [OUT OF SCOPE]
β yes
[RETRIEVE] ββββ Hybrid search for medical documents βββββ [TOOL: search]
β
[GRADE DOCUMENTS] ββββ Are results relevant? βββββ [REWRITE QUERY] βββ loop
β yes
[CLINICAL ANALYSIS] ββββ Your 6 medical agents βββββ structured analysis
β
[GENERATE RESPONSE] ββββ Synthesize with citations βββββ final answer
Tasks:
- Create
src/services/agents/agentic_rag.pyβAgenticRAGServiceclass - Create
src/services/agents/nodes/:guardrail_node.pyβ Medical domain validation (score 0-100)- In-scope: biomarker questions, disease queries, clinical guidelines
- Out-of-scope: non-medical, general knowledge, harmful content
retrieve_node.pyβ Creates tool call withmax_retrieval_attemptsgrade_documents_node.pyβ LLM evaluates medical relevancerewrite_query_node.pyβ LLM rewrites for better medical retrievalgenerate_answer_node.pyβ Uses your existing agent pipeline OR direct LLMout_of_scope_node.pyβ Polite medical-domain rejection
- Create
src/services/agents/state.pyβ Enhanced state with guardrail_result, routing_decision, grading_results - Create
src/services/agents/context.pyβ Runtime context for dependency injection - Create
src/services/agents/prompts.pyβ Medical-specific prompts:- Guardrail: "Is this about health/biomarkers/medical conditions?"
- Grading: "Does this medical document answer the clinical question?"
- Rewriting: "Improve this medical query for better document retrieval"
- Generation: "Synthesize medical findings with citations and safety caveats"
- Create
src/services/agents/tools.pyβ Medical retriever tool wrapping OpenSearch - Create
POST /api/v1/ask-agenticendpoint - Add Langfuse tracing to every node
7.2 Medical Guardrails (Critical for MedTech)
Beyond the course's simple domain check, add medical-specific safety:
Tasks:
- Input guardrails:
- Detect harmful queries (self-harm, drug abuse guidance)
- Detect attempts to get diagnosis without proper data
- Validate biomarker values are physiologically plausible
- Output guardrails:
- Always include "consult your healthcare provider" disclaimer
- Never provide definitive diagnosis (always "suggests" / "may indicate")
- Flag critical biomarker values with immediate action advice
- Ensure safety_alerts are present for out-of-range values
- Citation guardrails:
- Ensure all medical claims have document citations
- Flag unsupported claims
7.3 Telegram Bot Integration
Tasks:
- Create
src/services/telegram/package (adapt course pattern) - Implement bot commands:
/startβ Welcome with medical assistant introduction/helpβ Show capabilities and input format/analyze <biomarker values>β Quick biomarker analysis/search <medical query>β Search medical documents/reportβ Get last analysis as formatted report- Free text β Full RAG Q&A about medical topics
- Add typing indicators and progress messages
- Integrate caching for repeated queries
- Add rate limiting (medical queries shouldn't be spammed)
- Create
TelegramFactorygated byTELEGRAM__ENABLED=true
7.4 Feedback Loop
Tasks:
- Create
POST /api/v1/feedbackendpoint (adapt from course) - Integrate with Langfuse scoring
- Use feedback data to identify weak prompts β feed into SOP evolution engine
Phase 8: MedTech-Specific Additions (Beyond Course)
Goal: Things the course doesn't cover but your medical domain demands.
8.1 HIPAA-Awareness Patterns
Tasks:
- Never log patient biomarker values in plain text
- Add request ID tracking without PII
- Create data retention policy (auto-delete analysis data after configurable period)
- Add audit logging for all analysis requests
- Document HIPAA compliance approach (even if not yet certified)
8.2 Medical Safety Testing
Tasks:
- Create medical-specific test suite:
- Critical value detection tests (every critical biomarker)
- Guardrail rejection tests (non-medical queries)
- Citation completeness tests
- Safety disclaimer presence tests
- Biomarker normalization tests (already have some)
- Integrate 5D evaluation into CI pipeline
- Create test fixtures with realistic medical scenarios
8.3 Evolution Engine Integration
Tasks:
- Wire SOP evolution engine to production metrics (Langfuse data)
- Create Airflow DAG for scheduled evolution cycles
- Store evolved SOPs in PostgreSQL with version tracking
- A/B test SOP variants using Langfuse trace comparison
8.4 Multi-condition Support
Tasks:
- Extend condition coverage beyond current 5 diseases
- Add condition-specific retrieval strategies
- Create condition-specific chunking filters
- Support multi-condition analysis (comorbidities)
Implementation Priority Matrix
| Priority | Phase | Effort | Impact | Dependencies |
|---|---|---|---|---|
| π΄ P0 | 1.1 Docker Compose | 2 days | Critical | None |
| π΄ P0 | 1.2 Pydantic Settings | 1 day | Critical | None |
| π΄ P0 | 1.4 Project Restructure | 2 days | Critical | None |
| π΄ P0 | 1.5 Dev Tooling | 0.5 day | Critical | 1.4 |
| π΄ P0 | 1.3 PostgreSQL + Models | 2 days | Critical | 1.1, 1.4 |
| π‘ P1 | 3.1 OpenSearch Client | 2 days | High | 1.1, 1.4 |
| π‘ P1 | 3.2 Medical Index Mapping | 1 day | High | 3.1 |
| π‘ P1 | 4.1 Medical Text Chunker | 2 days | High | 3.1 |
| π‘ P1 | 4.2 Production Embeddings | 1 day | High | 4.1 |
| π‘ P1 | 4.3 Hybrid Search + RRF | 1 day | High | 3.1, 4.2 |
| π‘ P1 | 5.1 Ollama Client | 1 day | High | 1.4 |
| π‘ P1 | 5.2 Streaming Endpoints | 1 day | High | 5.1, 4.3 |
| π‘ P1 | 2.1 PDF Parser (Docling) | 1 day | High | 1.4 |
| π‘ P1 | 7.1 Agentic RAG Wrapper | 3 days | High | 5.2, 4.3 |
| π‘ P1 | 7.2 Medical Guardrails | 2 days | High | 7.1 |
| π’ P2 | 2.3 Airflow Pipeline | 2 days | Medium | 1.1, 2.1, 4.1 |
| π’ P2 | 5.3 Gradio Interface | 1 day | Medium | 5.2 |
| π’ P2 | 6.1 Langfuse Tracing | 2 days | Medium | 1.1, 5.2 |
| π’ P2 | 6.2 Redis Caching | 1 day | Medium | 1.1, 5.2 |
| π’ P2 | 6.3 Health Dashboard | 0.5 day | Medium | 6.1 |
| π’ P2 | 7.3 Telegram Bot | 2 days | Medium | 7.1, 6.2 |
| π’ P2 | 7.4 Feedback Loop | 0.5 day | Medium | 6.1 |
| π΅ P3 | 2.2 Medical Sources | 2 days | Low | 2.1 |
| π΅ P3 | 8.1 HIPAA Patterns | 1 day | Low | 1.3 |
| π΅ P3 | 8.2 Safety Testing | 2 days | Low | 7.2 |
| π΅ P3 | 8.3 Evolution Integration | 2 days | Low | 6.1, 2.3 |
| π΅ P3 | 8.4 Multi-condition | 3 days | Low | 4.1 |
Estimated Total: ~40 days of focused work
Migration Strategy
Step 1: Foundation (Week 1-2 of work)
- Restructure project layout β Phase 1.4
- Create Pydantic Settings β Phase 1.2
- Set up Docker Compose β Phase 1.1
- Add PostgreSQL with models β Phase 1.3
- Add dev tooling β Phase 1.5
Step 2: Search Engine (Week 2-3)
- Create OpenSearch client + medical mapping β Phase 3.1, 3.2
- Build medical text chunker β Phase 4.1
- Add production embeddings (Jina) β Phase 4.2
- Implement hybrid search + RRF β Phase 4.3
- Upgrade PDF parser to Docling β Phase 2.1
Step 3: RAG Pipeline (Week 3-4)
- Create Ollama client β Phase 5.1
- Add streaming endpoints β Phase 5.2
- Build agentic RAG wrapper β Phase 7.1
- Add medical guardrails β Phase 7.2
- Create Gradio interface β Phase 5.3
Step 4: Production Hardening (Week 4-5)
- Add Langfuse observability β Phase 6.1
- Add Redis caching β Phase 6.2
- Set up Airflow pipeline β Phase 2.3
- Build Telegram bot β Phase 7.3
- Add feedback loop β Phase 7.4
Step 5: Polish (Week 5-6)
- Health dashboard β Phase 6.3
- Medical safety testing β Phase 8.2
- HIPAA patterns β Phase 8.1
- Evolution engine integration β Phase 8.3
Key Migration Rules
- Never break what works: Keep all existing agents functional throughout
- Test at every step: Run existing tests after each phase
- Incremental Docker: Start with API + PostgreSQL, add services one at a time
- Feature flags: Gate new features (Telegram, Langfuse, Redis) behind settings
- Backward compatibility: Keep CLI chatbot working alongside new API
Architecture Target State
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Docker Compose Orchestration β
β β
β ββββββββββββ βββββββββββββ βββββββββββββ ββββββββββ βββββββββββ β
β β FastAPI β βPostgreSQL β β OpenSearch β β Ollama β β Airflow β β
β β + Gradio β β (reports, β β (hybrid β β (local β β (daily β β
β β (8000, β β docs, β β medical β β LLM) β β ingest) β β
β β 7861) β β history) β β search) β β β β β β
β ββββββ¬ββββββ βββββββ¬ββββββ βββββββ¬ββββββ βββββ¬βββββ ββββββ¬βββββ β
β β β β β β β
β ββββββ΄ββββββ βββββββ΄ββββββ ββββββ΄βββββββββββββ΄βββββββββββββ΄βββ β
β β Redis β β Langfuse β β mediguard-network β β
β β (cache) β β (observe) β ββββββββββββββββββββββββββββββββββββ β
β ββββββββββββ βββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Agentic RAG Pipeline β β
β β β β
β β Query β [Guardrail] β [Retrieve] β [Grade] β [6 Medical Agents] β β
β β β β β β β β
β β [Out of Scope] [Rewrite] [Generate] β Final Response β β
β β β β
β β Agents: Biomarker Analyzer β Disease Explainer β Linker β β
β β Clinical Guidelines β Confidence β Synthesizer β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββββββ β
β β Telegram Bot β β Gradio UI β β 5D Eval + SOP Evolution β β
β β (mobile) β β (desktop) β β (self-improvement loop) β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Files to Create (Summary)
| New File | Source of Inspiration |
|---|---|
docker-compose.yml |
Course compose.yml (adapted) |
Dockerfile |
Course Dockerfile (multi-stage UV) |
Makefile |
Course Makefile |
pyproject.toml |
Course pyproject.toml |
.pre-commit-config.yaml |
Course .pre-commit-config.yaml |
.env.example |
Course .env.example |
src/main.py |
Course src/main.py (lifespan pattern) |
src/config.py |
Course src/config.py + existing SOP config |
src/dependencies.py |
Course src/dependencies.py |
src/exceptions.py |
Course src/exceptions.py (medical exceptions) |
src/database.py |
Course src/database.py |
src/db/* |
Course src/db/* |
src/models/analysis.py |
New (medical domain) |
src/models/document.py |
Course src/models/paper.py (adapted) |
src/repositories/* |
Course src/repositories/* (adapted) |
src/routers/ask.py |
Course src/routers/ask.py |
src/routers/search.py |
Course src/routers/hybrid_search.py |
src/routers/health.py |
Course src/routers/ping.py (enhanced) |
src/schemas/* |
Course src/schemas/* (medical schemas) |
src/services/opensearch/* |
Course src/services/opensearch/* |
src/services/embeddings/* |
Course src/services/embeddings/* |
src/services/ollama/* |
Course src/services/ollama/* |
src/services/cache/* |
Course src/services/cache/* |
src/services/langfuse/* |
Course src/services/langfuse/* |
src/services/indexing/* |
Course src/services/indexing/* (medical chunks) |
src/services/pdf_parser/* |
Course src/services/pdf_parser/* |
src/services/telegram/* |
Course src/services/telegram/* |
src/services/agents/agentic_rag.py |
Course (adapted for medical agents) |
src/services/agents/nodes/* |
Course (medical guardrails) |
src/services/agents/context.py |
Course |
src/services/agents/prompts.py |
Course (medical prompts) |
src/gradio_app.py |
Course src/gradio_app.py (medical UI) |
airflow/dags/medical_ingestion.py |
Course airflow/dags/arxiv_paper_ingestion.py |
Files to Keep & Enhance
| Existing File | Action |
|---|---|
src/agents/biomarker_analyzer.py |
Keep, move to src/services/agents/medical/ |
src/agents/disease_explainer.py |
Keep, move, add OpenSearch retriever |
src/agents/biomarker_linker.py |
Keep, move, add OpenSearch retriever |
src/agents/clinical_guidelines.py |
Keep, move, add OpenSearch retriever |
src/agents/confidence_assessor.py |
Keep, move |
src/agents/response_synthesizer.py |
Keep, move |
src/biomarker_validator.py |
Keep, move to src/services/biomarker/ |
src/biomarker_normalization.py |
Keep, move to src/services/biomarker/ |
src/evaluation/ |
Keep, enhance with Langfuse integration |
src/evolution/ |
Keep, wire to production metrics |
config/biomarker_references.json |
Keep as seed data, migrate to DB |
scripts/chat.py |
Keep, update imports |
tests/* |
Keep, add production test fixtures |
This plan transforms MediGuard AI from a working prototype into a production-grade medical RAG system, applying every infrastructure lesson from the arXiv Paper Curator course while preserving and enhancing your unique medical domain logic.