Spaces:
Running
title: Agentic RagBot
emoji: π₯
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: true
license: mit
app_port: 7860
tags:
- medical
- biomarker
- rag
- healthcare
- langgraph
- agents
short_description: Multi-Agent RAG System for Medical Biomarker Analysis
MediGuard AI: Multi-Agent RAG System for Medical Biomarker Analysis
A biomarker analysis system combining 6 specialized AI agents with medical knowledge retrieval (RAG) to provide evidence-based insights on blood test results.
β οΈ Disclaimer: This is an AI-assisted analysis tool, NOT a medical device. Always consult healthcare professionals for medical decisions.
Key Features
- 6 Specialist Agents - Biomarker validation, disease scoring, RAG-powered explanation, confidence assessment
- Medical Knowledge Base - Clinical guidelines stored in vector database (FAISS or OpenSearch)
- Multiple Interfaces - Interactive CLI chat, REST API, Gradio web UI
- Evidence-Based - All recommendations backed by retrieved medical literature with citations
- Free Cloud LLMs - Uses Groq (LLaMA 3.3-70B) or Google Gemini - no API costs
- Biomarker Normalization - 80+ aliases mapped to 24 canonical biomarker names
- Production Architecture - Full error handling, safety alerts, confidence scoring
Architecture Overview
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MediGuard AI Pipeline β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Input β Guardrail β Router β β¬β Biomarker Analysis Path β
β β (6 specialist agents) β
β ββ General Medical Q&A Path β
β (RAG: retrieve β grade) β
β β Response Synthesizer β Output β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Disease Scoring
The system uses rule-based heuristics (not ML models) to score disease likelihood:
- Diabetes: Glucose > 126, HbA1c β₯ 6.5
- Anemia: Hemoglobin < 12, MCV < 80
- Heart Disease: Cholesterol > 240, Troponin > 0.04
- Thrombocytopenia: Platelets < 150,000
- Thalassemia: MCV + Hemoglobin pattern
Note: Future versions may include trained ML classifiers for improved accuracy.
Quick Start
Installation (5 minutes):
# Clone & setup
git clone https://github.com/yourusername/ragbot.git
cd ragbot
python -m venv .venv
.venv\Scripts\activate # Windows
pip install -r requirements.txt
# Get free API key
# 1. Sign up: https://console.groq.com/keys
# 2. Copy API key to .env
# Run setup
python scripts/setup_embeddings.py
# Start chatting
python scripts/chat.py
See QUICKSTART.md for detailed setup instructions.
Documentation
| Document | Purpose |
|---|---|
| QUICKSTART.md | 5-minute setup guide |
| CONTRIBUTING.md | How to contribute |
| docs/ARCHITECTURE.md | System design & components |
| docs/API.md | REST API reference |
| docs/DEVELOPMENT.md | Development & extension guide |
| scripts/README.md | Utility scripts reference |
| examples/README.md | Web/mobile integration examples |
Usage
Interactive CLI
python scripts/chat.py
You: My glucose is 140 and HbA1c is 10
Primary Finding: Diabetes (100% confidence)
Critical Alerts: Hyperglycemia, elevated HbA1c
Recommendations: Seek medical attention, lifestyle changes
Actions: Physical activity, reduce carbs, weight loss
REST API
# Start the unified production server
uvicorn src.main:app --reload
# Analyze biomarkers (structured input)
curl -X POST http://localhost:8000/analyze/structured \
-H "Content-Type: application/json" \
-d '{
"biomarkers": {"Glucose": 140, "HbA1c": 10.0}
}'
# Ask medical questions (RAG-powered)
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{
"question": "What does high HbA1c mean?"
}'
# Search knowledge base directly
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{
"query": "diabetes management guidelines",
"top_k": 5
}'
See docs/API.md for full API reference.
Project Structure
RagBot/
βββ src/ # Core application
β βββ __init__.py
β βββ workflow.py # Multi-agent orchestration (LangGraph)
β βββ state.py # Pydantic state models
β βββ biomarker_validator.py # Validation logic
β βββ biomarker_normalization.py # Name normalization (80+ aliases)
β βββ llm_config.py # LLM/embedding provider config
β βββ pdf_processor.py # Vector store management
β βββ config.py # Global configuration
β βββ agents/ # 6 specialist agents
β βββ __init__.py
β βββ biomarker_analyzer.py
β βββ disease_explainer.py
β βββ biomarker_linker.py
β βββ clinical_guidelines.py
β βββ confidence_assessor.py
β βββ response_synthesizer.py
β
βββ api/ # REST API (FastAPI)
β βββ app/main.py # FastAPI server
β βββ app/routes/ # API endpoints
β βββ app/models/schemas.py # Pydantic request/response schemas
β βββ app/services/ # Business logic
β
βββ scripts/ # Utilities
β βββ chat.py # Interactive CLI chatbot
β βββ setup_embeddings.py # Vector store builder
β
βββ config/ # Configuration
β βββ biomarker_references.json # 24 biomarker reference ranges
β
βββ data/ # Data storage
β βββ medical_pdfs/ # Source documents
β βββ vector_stores/ # FAISS database
β
βββ tests/ # Test suite (30 tests)
βββ examples/ # Integration examples
βββ docs/ # Documentation
β
βββ QUICKSTART.md # Setup guide
βββ CONTRIBUTING.md # Contribution guidelines
βββ requirements.txt # Python dependencies
βββ LICENSE
Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Orchestration | LangGraph | Multi-agent workflow control |
| LLM | Groq (LLaMA 3.3-70B) | Fast, free inference |
| LLM (Alt) | Google Gemini 2.0 Flash | Free alternative |
| Embeddings | HuggingFace / Jina / Google | Vector representations |
| Vector DB | FAISS (local) / OpenSearch (production) | Similarity search |
| API | FastAPI | REST endpoints |
| Web UI | Gradio | Interactive analysis interface |
| Validation | Pydantic V2 | Type safety & schemas |
| Cache | Redis (optional) | Response caching |
| Observability | Langfuse (optional) | LLM tracing & monitoring |
How It Works
User Input ("My glucose is 140...")
β
βΌ
ββββββββββββββββββββββββββββββββββββββββ
β Biomarker Extraction & Normalization β β LLM parses text, maps 80+ aliases
ββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββ
β Disease Scoring (Rule-Based) β β Heuristic scoring, NOT ML
ββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββ
β RAG Knowledge Retrieval β β FAISS/OpenSearch vector search
ββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββ
β 6-Agent LangGraph Pipeline β
β ββ Biomarker Analyzer (validation) β
β ββ Disease Explainer (pathophysiology)β
β ββ Biomarker Linker (key drivers) β
β ββ Clinical Guidelines (treatment) β
β ββ Confidence Assessor (reliability) β
β ββ Response Synthesizer (final) β
ββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββ
β Structured Response + Safety Alerts β
ββββββββββββββββββββββββββββββββββββββββ
Supported Biomarkers (24)
- Glucose Control: Glucose, HbA1c, Insulin
- Lipids: Cholesterol, LDL Cholesterol, HDL Cholesterol, Triglycerides
- Body Metrics: BMI
- Blood Cells: Hemoglobin, Platelets, White Blood Cells, Red Blood Cells, Hematocrit
- RBC Indices: Mean Corpuscular Volume, Mean Corpuscular Hemoglobin, MCHC
- Cardiovascular: Heart Rate, Systolic Blood Pressure, Diastolic Blood Pressure, Troponin
- Inflammation: C-reactive Protein
- Liver: ALT, AST
- Kidney: Creatinine
See config/biomarker_references.json for full reference ranges.
Disease Coverage
- Diabetes
- Anemia
- Heart Disease
- Thrombocytopenia
- Thalassemia
- (Extensible - add custom domains)
Privacy & Security
- All processing runs locally after setup
- No personal health data stored
- Embeddings computed locally or cached
- Vector store derived from public medical literature
- Can operate completely offline with Ollama provider
Performance
- Response Time: 15-25 seconds (6 agents + RAG retrieval)
- Knowledge Base: 750 pages, 2,609 document chunks
- Cost: Free (Groq/Gemini API + local/cloud embeddings)
- Hardware: CPU-only (no GPU needed)
Testing
# Run unit tests (30 tests)
.venv\Scripts\python.exe -m pytest tests/ -q \
--ignore=tests/test_basic.py \
--ignore=tests/test_diabetes_patient.py \
--ignore=tests/test_evolution_loop.py \
--ignore=tests/test_evolution_quick.py \
--ignore=tests/test_evaluation_system.py
# Run specific test file
.venv\Scripts\python.exe -m pytest tests/test_codebase_fixes.py -v
# Run all tests (includes integration tests requiring LLM API keys)
.venv\Scripts\python.exe -m pytest tests/ -v
Contributing
Contributions welcome! See CONTRIBUTING.md for:
- Code style guidelines
- Pull request process
- Testing requirements
- Development setup
Development
Want to extend RagBot?
- Add custom biomarkers: docs/DEVELOPMENT.md
- Add medical domains: docs/DEVELOPMENT.md
- Create custom agents: docs/DEVELOPMENT.md
- Switch LLM providers: docs/DEVELOPMENT.md
License
MIT License - See LICENSE
Resources
Ready to get started? -> QUICKSTART.md
Want to understand the architecture? -> docs/ARCHITECTURE.md
Looking to integrate with your app? -> examples/README.md