Agentic-RagBot / README.md
Nikhil Pravin Pise
Deploy to HuggingFace Spaces - Medical RAG with vector store
9699bea
|
raw
history blame
9.3 kB
---
title: Agentic RagBot
emoji: πŸ₯
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: true
license: mit
app_port: 7860
tags:
- medical
- biomarker
- rag
- healthcare
- langgraph
- agents
short_description: Multi-Agent RAG System for Medical Biomarker Analysis
---
# RagBot: Multi-Agent RAG System for Medical Biomarker Analysis
A production-ready biomarker analysis system combining 6 specialized AI agents with medical knowledge retrieval to provide evidence-based insights on blood test results in **15-25 seconds**.
## Key Features
- **6 Specialist Agents** - Biomarker validation, disease prediction, RAG-powered analysis, confidence assessment
- **Medical Knowledge Base** - 750+ pages of clinical guidelines (FAISS vector store)
- **Multiple Interfaces** - Interactive CLI chat, REST API, ready for web/mobile integration
- **Evidence-Based** - All recommendations backed by retrieved medical literature
- **Free Cloud LLMs** - Uses Groq (LLaMA 3.3-70B) or Google Gemini - no cost
- **Biomarker Normalization** - 80+ aliases mapped to 24 canonical biomarker names
- **Production-Ready** - Full error handling, safety alerts, confidence scoring, 30 unit tests
## Quick Start
**Installation (5 minutes):**
```bash
# Clone & setup
git clone https://github.com/yourusername/ragbot.git
cd ragbot
python -m venv .venv
.venv\Scripts\activate # Windows
pip install -r requirements.txt
# Get free API key
# 1. Sign up: https://console.groq.com/keys
# 2. Copy API key to .env
# Run setup
python scripts/setup_embeddings.py
# Start chatting
python scripts/chat.py
```
See **[QUICKSTART.md](QUICKSTART.md)** for detailed setup instructions.
## Documentation
| Document | Purpose |
|----------|---------|
| [**QUICKSTART.md**](QUICKSTART.md) | 5-minute setup guide |
| [**CONTRIBUTING.md**](CONTRIBUTING.md) | How to contribute |
| [**docs/ARCHITECTURE.md**](docs/ARCHITECTURE.md) | System design & components |
| [**docs/API.md**](docs/API.md) | REST API reference |
| [**docs/DEVELOPMENT.md**](docs/DEVELOPMENT.md) | Development & extension guide |
| [**scripts/README.md**](scripts/README.md) | Utility scripts reference |
| [**examples/README.md**](examples/) | Web/mobile integration examples |
## Usage
### Interactive CLI
```bash
python scripts/chat.py
You: My glucose is 140 and HbA1c is 10
Primary Finding: Diabetes (100% confidence)
Critical Alerts: Hyperglycemia, elevated HbA1c
Recommendations: Seek medical attention, lifestyle changes
Actions: Physical activity, reduce carbs, weight loss
```
### REST API
```bash
# Start server
cd api
python -m uvicorn app.main:app
# Analyze biomarkers (structured input)
curl -X POST http://localhost:8000/api/v1/analyze/structured \
-H "Content-Type: application/json" \
-d '{
"biomarkers": {"Glucose": 140, "HbA1c": 10.0}
}'
# Analyze biomarkers (natural language)
curl -X POST http://localhost:8000/api/v1/analyze/natural \
-H "Content-Type: application/json" \
-d '{
"message": "My glucose is 140 and HbA1c is 10"
}'
```
See **[docs/API.md](docs/API.md)** for full API reference.
## Project Structure
```
RagBot/
β”œβ”€β”€ src/ # Core application
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ workflow.py # Multi-agent orchestration (LangGraph)
β”‚ β”œβ”€β”€ state.py # Pydantic state models
β”‚ β”œβ”€β”€ biomarker_validator.py # Validation logic
β”‚ β”œβ”€β”€ biomarker_normalization.py # Name normalization (80+ aliases)
β”‚ β”œβ”€β”€ llm_config.py # LLM/embedding provider config
β”‚ β”œβ”€β”€ pdf_processor.py # Vector store management
β”‚ β”œβ”€β”€ config.py # Global configuration
β”‚ └── agents/ # 6 specialist agents
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ biomarker_analyzer.py
β”‚ β”œβ”€β”€ disease_explainer.py
β”‚ β”œβ”€β”€ biomarker_linker.py
β”‚ β”œβ”€β”€ clinical_guidelines.py
β”‚ β”œβ”€β”€ confidence_assessor.py
β”‚ └── response_synthesizer.py
β”‚
β”œβ”€β”€ api/ # REST API (FastAPI)
β”‚ β”œβ”€β”€ app/main.py # FastAPI server
β”‚ β”œβ”€β”€ app/routes/ # API endpoints
β”‚ β”œβ”€β”€ app/models/schemas.py # Pydantic request/response schemas
β”‚ └── app/services/ # Business logic
β”‚
β”œβ”€β”€ scripts/ # Utilities
β”‚ β”œβ”€β”€ chat.py # Interactive CLI chatbot
β”‚ └── setup_embeddings.py # Vector store builder
β”‚
β”œβ”€β”€ config/ # Configuration
β”‚ └── biomarker_references.json # 24 biomarker reference ranges
β”‚
β”œβ”€β”€ data/ # Data storage
β”‚ β”œβ”€β”€ medical_pdfs/ # Source documents
β”‚ └── vector_stores/ # FAISS database
β”‚
β”œβ”€β”€ tests/ # Test suite (30 tests)
β”œβ”€β”€ examples/ # Integration examples
β”œβ”€β”€ docs/ # Documentation
β”‚
β”œβ”€β”€ QUICKSTART.md # Setup guide
β”œβ”€β”€ CONTRIBUTING.md # Contribution guidelines
β”œβ”€β”€ requirements.txt # Python dependencies
└── LICENSE
```
## Technology Stack
| Component | Technology | Purpose |
|-----------|-----------|---------|
| Orchestration | **LangGraph** | Multi-agent workflow control |
| LLM | **Groq (LLaMA 3.3-70B)** | Fast, free inference |
| LLM (Alt) | **Google Gemini 2.0 Flash** | Free alternative |
| Embeddings | **Google Gemini / HuggingFace** | Vector representations |
| Vector DB | **FAISS** | Efficient similarity search |
| API | **FastAPI** | REST endpoints |
| Validation | **Pydantic V2** | Type safety & schemas |
## How It Works
```
User Input ("My glucose is 140...")
|
[Biomarker Extraction] -> Parse & normalize (80+ aliases)
|
[Disease Prediction] -> Rule-based + LLM hypothesis
|
[RAG Retrieval] -> Get medical docs from FAISS vector store
|
[6 Agent Pipeline via LangGraph]
|-- Biomarker Analyzer (validation + safety alerts)
|-- Disease Explainer (RAG pathophysiology)
|-- Biomarker-Disease Linker (RAG key drivers)
|-- Clinical Guidelines (RAG recommendations)
|-- Confidence Assessor (reliability scoring)
+-- Response Synthesizer (final structured report)
|
[Output] -> Comprehensive report with safety alerts
```
## Supported Biomarkers (24)
- **Glucose Control**: Glucose, HbA1c, Insulin
- **Lipids**: Cholesterol, LDL Cholesterol, HDL Cholesterol, Triglycerides
- **Body Metrics**: BMI
- **Blood Cells**: Hemoglobin, Platelets, White Blood Cells, Red Blood Cells, Hematocrit
- **RBC Indices**: Mean Corpuscular Volume, Mean Corpuscular Hemoglobin, MCHC
- **Cardiovascular**: Heart Rate, Systolic Blood Pressure, Diastolic Blood Pressure, Troponin
- **Inflammation**: C-reactive Protein
- **Liver**: ALT, AST
- **Kidney**: Creatinine
See [config/biomarker_references.json](config/biomarker_references.json) for full reference ranges.
## Disease Coverage
- Diabetes
- Anemia
- Heart Disease
- Thrombocytopenia
- Thalassemia
- (Extensible - add custom domains)
## Privacy & Security
- All processing runs **locally** after setup
- No personal health data stored
- Embeddings computed locally or cached
- Vector store derived from public medical literature
- Can operate completely offline with Ollama provider
## Performance
- **Response Time**: 15-25 seconds (6 agents + RAG retrieval)
- **Knowledge Base**: 750 pages, 2,609 document chunks
- **Cost**: Free (Groq/Gemini API + local/cloud embeddings)
- **Hardware**: CPU-only (no GPU needed)
## Testing
```bash
# Run unit tests (30 tests)
.venv\Scripts\python.exe -m pytest tests/ -q \
--ignore=tests/test_basic.py \
--ignore=tests/test_diabetes_patient.py \
--ignore=tests/test_evolution_loop.py \
--ignore=tests/test_evolution_quick.py \
--ignore=tests/test_evaluation_system.py
# Run specific test file
.venv\Scripts\python.exe -m pytest tests/test_codebase_fixes.py -v
# Run all tests (includes integration tests requiring LLM API keys)
.venv\Scripts\python.exe -m pytest tests/ -v
```
## Contributing
Contributions welcome! See **[CONTRIBUTING.md](CONTRIBUTING.md)** for:
- Code style guidelines
- Pull request process
- Testing requirements
- Development setup
## Development
Want to extend RagBot?
- **Add custom biomarkers**: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#adding-a-new-biomarker)
- **Add medical domains**: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#adding-a-new-medical-domain)
- **Create custom agents**: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#creating-a-custom-analysis-agent)
- **Switch LLM providers**: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#switching-llm-providers)
## License
MIT License - See [LICENSE](LICENSE)
## Resources
- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
- [Groq API Docs](https://console.groq.com)
- [FAISS GitHub](https://github.com/facebookresearch/faiss)
- [FastAPI Guide](https://fastapi.tiangolo.com/)
---
**Ready to get started?** -> [QUICKSTART.md](QUICKSTART.md)
**Want to understand the architecture?** -> [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)
**Looking to integrate with your app?** -> [examples/README.md](examples/)