Spaces:
Running
Running
| title: Agentic RagBot | |
| emoji: π₯ | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| pinned: true | |
| license: mit | |
| app_port: 7860 | |
| tags: | |
| - medical | |
| - biomarker | |
| - rag | |
| - healthcare | |
| - langgraph | |
| - agents | |
| short_description: Multi-Agent RAG System for Medical Biomarker Analysis | |
| # MediGuard AI: Multi-Agent RAG System for Medical Biomarker Analysis | |
| A biomarker analysis system combining 6 specialized AI agents with medical knowledge retrieval (RAG) to provide evidence-based insights on blood test results. | |
| > **β οΈ Disclaimer:** This is an AI-assisted analysis tool, NOT a medical device. Always consult healthcare professionals for medical decisions. | |
| ## Key Features | |
| - **6 Specialist Agents** - Biomarker validation, disease scoring, RAG-powered explanation, confidence assessment | |
| - **Medical Knowledge Base** - Clinical guidelines stored in vector database (FAISS or OpenSearch) | |
| - **Multiple Interfaces** - Interactive CLI chat, REST API, Gradio web UI | |
| - **Evidence-Based** - All recommendations backed by retrieved medical literature with citations | |
| - **Free Cloud LLMs** - Uses Groq (LLaMA 3.3-70B) or Google Gemini - no API costs | |
| - **Biomarker Normalization** - 80+ aliases mapped to 24 canonical biomarker names | |
| - **Production Architecture** - Full error handling, safety alerts, confidence scoring | |
| ## Architecture Overview | |
| ``` | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β MediGuard AI Pipeline β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β Input β Guardrail β Router β β¬β Biomarker Analysis Path β | |
| β β (6 specialist agents) β | |
| β ββ General Medical Q&A Path β | |
| β (RAG: retrieve β grade) β | |
| β β Response Synthesizer β Output β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ### Disease Scoring | |
| The system uses **rule-based heuristics** (not ML models) to score disease likelihood: | |
| - Diabetes: Glucose > 126, HbA1c β₯ 6.5 | |
| - Anemia: Hemoglobin < 12, MCV < 80 | |
| - Heart Disease: Cholesterol > 240, Troponin > 0.04 | |
| - Thrombocytopenia: Platelets < 150,000 | |
| - Thalassemia: MCV + Hemoglobin pattern | |
| > **Note:** Future versions may include trained ML classifiers for improved accuracy. | |
| ## Quick Start | |
| **Installation (5 minutes):** | |
| ```bash | |
| # Clone & setup | |
| git clone https://github.com/yourusername/ragbot.git | |
| cd ragbot | |
| python -m venv .venv | |
| .venv\Scripts\activate # Windows | |
| pip install -r requirements.txt | |
| # Get free API key | |
| # 1. Sign up: https://console.groq.com/keys | |
| # 2. Copy API key to .env | |
| # Run setup | |
| python scripts/setup_embeddings.py | |
| # Start chatting | |
| python scripts/chat.py | |
| ``` | |
| See **[QUICKSTART.md](QUICKSTART.md)** for detailed setup instructions. | |
| ## Documentation | |
| | Document | Purpose | | |
| |----------|---------| | |
| | [**QUICKSTART.md**](QUICKSTART.md) | 5-minute setup guide | | |
| | [**CONTRIBUTING.md**](CONTRIBUTING.md) | How to contribute | | |
| | [**docs/ARCHITECTURE.md**](docs/ARCHITECTURE.md) | System design & components | | |
| | [**docs/API.md**](docs/API.md) | REST API reference | | |
| | [**docs/DEVELOPMENT.md**](docs/DEVELOPMENT.md) | Development & extension guide | | |
| | [**scripts/README.md**](scripts/README.md) | Utility scripts reference | | |
| | [**examples/README.md**](examples/) | Web/mobile integration examples | | |
| ## Usage | |
| ### Interactive CLI | |
| ```bash | |
| python scripts/chat.py | |
| You: My glucose is 140 and HbA1c is 10 | |
| Primary Finding: Diabetes (100% confidence) | |
| Critical Alerts: Hyperglycemia, elevated HbA1c | |
| Recommendations: Seek medical attention, lifestyle changes | |
| Actions: Physical activity, reduce carbs, weight loss | |
| ``` | |
| ### REST API | |
| ```bash | |
| # Start the unified production server | |
| uvicorn src.main:app --reload | |
| # Analyze biomarkers (structured input) | |
| curl -X POST http://localhost:8000/analyze/structured \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "biomarkers": {"Glucose": 140, "HbA1c": 10.0} | |
| }' | |
| # Ask medical questions (RAG-powered) | |
| curl -X POST http://localhost:8000/ask \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "question": "What does high HbA1c mean?" | |
| }' | |
| # Search knowledge base directly | |
| curl -X POST http://localhost:8000/search \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "query": "diabetes management guidelines", | |
| "top_k": 5 | |
| }' | |
| ``` | |
| See **[docs/API.md](docs/API.md)** for full API reference. | |
| ## Project Structure | |
| ``` | |
| RagBot/ | |
| βββ src/ # Core application | |
| β βββ __init__.py | |
| β βββ workflow.py # Multi-agent orchestration (LangGraph) | |
| β βββ state.py # Pydantic state models | |
| β βββ biomarker_validator.py # Validation logic | |
| β βββ biomarker_normalization.py # Name normalization (80+ aliases) | |
| β βββ llm_config.py # LLM/embedding provider config | |
| β βββ pdf_processor.py # Vector store management | |
| β βββ config.py # Global configuration | |
| β βββ agents/ # 6 specialist agents | |
| β βββ __init__.py | |
| β βββ biomarker_analyzer.py | |
| β βββ disease_explainer.py | |
| β βββ biomarker_linker.py | |
| β βββ clinical_guidelines.py | |
| β βββ confidence_assessor.py | |
| β βββ response_synthesizer.py | |
| β | |
| βββ api/ # REST API (FastAPI) | |
| β βββ app/main.py # FastAPI server | |
| β βββ app/routes/ # API endpoints | |
| β βββ app/models/schemas.py # Pydantic request/response schemas | |
| β βββ app/services/ # Business logic | |
| β | |
| βββ scripts/ # Utilities | |
| β βββ chat.py # Interactive CLI chatbot | |
| β βββ setup_embeddings.py # Vector store builder | |
| β | |
| βββ config/ # Configuration | |
| β βββ biomarker_references.json # 24 biomarker reference ranges | |
| β | |
| βββ data/ # Data storage | |
| β βββ medical_pdfs/ # Source documents | |
| β βββ vector_stores/ # FAISS database | |
| β | |
| βββ tests/ # Test suite (30 tests) | |
| βββ examples/ # Integration examples | |
| βββ docs/ # Documentation | |
| β | |
| βββ QUICKSTART.md # Setup guide | |
| βββ CONTRIBUTING.md # Contribution guidelines | |
| βββ requirements.txt # Python dependencies | |
| βββ LICENSE | |
| ``` | |
| ## Technology Stack | |
| | Component | Technology | Purpose | | |
| |-----------|-----------|---------| | |
| | Orchestration | **LangGraph** | Multi-agent workflow control | | |
| | LLM | **Groq (LLaMA 3.3-70B)** | Fast, free inference | | |
| | LLM (Alt) | **Google Gemini 2.0 Flash** | Free alternative | | |
| | Embeddings | **HuggingFace / Jina / Google** | Vector representations | | |
| | Vector DB | **FAISS** (local) / **OpenSearch** (production) | Similarity search | | |
| | API | **FastAPI** | REST endpoints | | |
| | Web UI | **Gradio** | Interactive analysis interface | | |
| | Validation | **Pydantic V2** | Type safety & schemas | | |
| | Cache | **Redis** (optional) | Response caching | | |
| | Observability | **Langfuse** (optional) | LLM tracing & monitoring | | |
| ## How It Works | |
| ``` | |
| User Input ("My glucose is 140...") | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| β Biomarker Extraction & Normalization β β LLM parses text, maps 80+ aliases | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| β Disease Scoring (Rule-Based) β β Heuristic scoring, NOT ML | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| β RAG Knowledge Retrieval β β FAISS/OpenSearch vector search | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| β 6-Agent LangGraph Pipeline β | |
| β ββ Biomarker Analyzer (validation) β | |
| β ββ Disease Explainer (pathophysiology)β | |
| β ββ Biomarker Linker (key drivers) β | |
| β ββ Clinical Guidelines (treatment) β | |
| β ββ Confidence Assessor (reliability) β | |
| β ββ Response Synthesizer (final) β | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| β Structured Response + Safety Alerts β | |
| ββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ## Supported Biomarkers (24) | |
| - **Glucose Control**: Glucose, HbA1c, Insulin | |
| - **Lipids**: Cholesterol, LDL Cholesterol, HDL Cholesterol, Triglycerides | |
| - **Body Metrics**: BMI | |
| - **Blood Cells**: Hemoglobin, Platelets, White Blood Cells, Red Blood Cells, Hematocrit | |
| - **RBC Indices**: Mean Corpuscular Volume, Mean Corpuscular Hemoglobin, MCHC | |
| - **Cardiovascular**: Heart Rate, Systolic Blood Pressure, Diastolic Blood Pressure, Troponin | |
| - **Inflammation**: C-reactive Protein | |
| - **Liver**: ALT, AST | |
| - **Kidney**: Creatinine | |
| See [config/biomarker_references.json](config/biomarker_references.json) for full reference ranges. | |
| ## Disease Coverage | |
| - Diabetes | |
| - Anemia | |
| - Heart Disease | |
| - Thrombocytopenia | |
| - Thalassemia | |
| - (Extensible - add custom domains) | |
| ## Privacy & Security | |
| - All processing runs **locally** after setup | |
| - No personal health data stored | |
| - Embeddings computed locally or cached | |
| - Vector store derived from public medical literature | |
| - Can operate completely offline with Ollama provider | |
| ## Performance | |
| - **Response Time**: 15-25 seconds (6 agents + RAG retrieval) | |
| - **Knowledge Base**: 750 pages, 2,609 document chunks | |
| - **Cost**: Free (Groq/Gemini API + local/cloud embeddings) | |
| - **Hardware**: CPU-only (no GPU needed) | |
| ## Testing | |
| ```bash | |
| # Run unit tests (30 tests) | |
| .venv\Scripts\python.exe -m pytest tests/ -q \ | |
| --ignore=tests/test_basic.py \ | |
| --ignore=tests/test_diabetes_patient.py \ | |
| --ignore=tests/test_evolution_loop.py \ | |
| --ignore=tests/test_evolution_quick.py \ | |
| --ignore=tests/test_evaluation_system.py | |
| # Run specific test file | |
| .venv\Scripts\python.exe -m pytest tests/test_codebase_fixes.py -v | |
| # Run all tests (includes integration tests requiring LLM API keys) | |
| .venv\Scripts\python.exe -m pytest tests/ -v | |
| ``` | |
| ## Contributing | |
| Contributions welcome! See **[CONTRIBUTING.md](CONTRIBUTING.md)** for: | |
| - Code style guidelines | |
| - Pull request process | |
| - Testing requirements | |
| - Development setup | |
| ## Development | |
| Want to extend RagBot? | |
| - **Add custom biomarkers**: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#adding-a-new-biomarker) | |
| - **Add medical domains**: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#adding-a-new-medical-domain) | |
| - **Create custom agents**: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#creating-a-custom-analysis-agent) | |
| - **Switch LLM providers**: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#switching-llm-providers) | |
| ## License | |
| MIT License - See [LICENSE](LICENSE) | |
| ## Resources | |
| - [LangGraph Documentation](https://langchain-ai.github.io/langgraph/) | |
| - [Groq API Docs](https://console.groq.com) | |
| - [FAISS GitHub](https://github.com/facebookresearch/faiss) | |
| - [FastAPI Guide](https://fastapi.tiangolo.com/) | |
| --- | |
| **Ready to get started?** -> [QUICKSTART.md](QUICKSTART.md) | |
| **Want to understand the architecture?** -> [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) | |
| **Looking to integrate with your app?** -> [examples/README.md](examples/) | |