Spaces:

T0X1N
/

Agentic-RagBot

Running

App Files Files Community

Agentic-RagBot / README.md

Nikhil Pravin Pise

Production Upgrade v2.0: SSE streaming, HIPAA compliance, Gradio Q&A UI

3ca1d38 17 days ago

preview code

raw

history blame contribute delete

12.6 kB

	---
	title: Agentic RagBot
	emoji: 🏥
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	pinned: true
	license: mit
	app_port: 7860
	tags:
	- medical
	- biomarker
	- rag
	- healthcare
	- langgraph
	- agents
	short_description: Multi-Agent RAG System for Medical Biomarker Analysis
	---

	# MediGuard AI: Multi-Agent RAG System for Medical Biomarker Analysis

	A biomarker analysis system combining 6 specialized AI agents with medical knowledge retrieval (RAG) to provide evidence-based insights on blood test results.

	> ⚠️ Disclaimer: This is an AI-assisted analysis tool, NOT a medical device. Always consult healthcare professionals for medical decisions.

	## Key Features

	- 6 Specialist Agents - Biomarker validation, disease scoring, RAG-powered explanation, confidence assessment
	- Medical Knowledge Base - Clinical guidelines stored in vector database (FAISS or OpenSearch)
	- Multiple Interfaces - Interactive CLI chat, REST API, Gradio web UI
	- Evidence-Based - All recommendations backed by retrieved medical literature with citations
	- Free Cloud LLMs - Uses Groq (LLaMA 3.3-70B) or Google Gemini - no API costs
	- Biomarker Normalization - 80+ aliases mapped to 24 canonical biomarker names
	- Production Architecture - Full error handling, safety alerts, confidence scoring

	## Architecture Overview

	```
	┌────────────────────────────────────────────────────────────────┐
	│ MediGuard AI Pipeline │
	├────────────────────────────────────────────────────────────────┤
	│ Input → Guardrail → Router → ┬→ Biomarker Analysis Path │
	│ │ (6 specialist agents) │
	│ └→ General Medical Q&A Path │
	│ (RAG: retrieve → grade) │
	│ → Response Synthesizer → Output │
	└────────────────────────────────────────────────────────────────┘
	```

	### Disease Scoring

	The system uses rule-based heuristics (not ML models) to score disease likelihood:
	- Diabetes: Glucose > 126, HbA1c ≥ 6.5
	- Anemia: Hemoglobin < 12, MCV < 80
	- Heart Disease: Cholesterol > 240, Troponin > 0.04
	- Thrombocytopenia: Platelets < 150,000
	- Thalassemia: MCV + Hemoglobin pattern

	> Note: Future versions may include trained ML classifiers for improved accuracy.

	## Quick Start

	Installation (5 minutes):

	```bash
	# Clone & setup
	git clone https://github.com/yourusername/ragbot.git
	cd ragbot
	python -m venv .venv
	.venv\Scripts\activate # Windows
	pip install -r requirements.txt

	# Get free API key
	# 1. Sign up: https://console.groq.com/keys
	# 2. Copy API key to .env

	# Run setup
	python scripts/setup_embeddings.py

	# Start chatting
	python scripts/chat.py
	```

	See [QUICKSTART.md](QUICKSTART.md) for detailed setup instructions.

	## Documentation

	\| Document \| Purpose \|
	\|----------\|---------\|
	\| [QUICKSTART.md](QUICKSTART.md) \| 5-minute setup guide \|
	\| [CONTRIBUTING.md](CONTRIBUTING.md) \| How to contribute \|
	\| [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) \| System design & components \|
	\| [docs/API.md](docs/API.md) \| REST API reference \|
	\| [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md) \| Development & extension guide \|
	\| [scripts/README.md](scripts/README.md) \| Utility scripts reference \|
	\| [examples/README.md](examples/) \| Web/mobile integration examples \|

	## Usage

	### Interactive CLI

	```bash
	python scripts/chat.py

	You: My glucose is 140 and HbA1c is 10

	Primary Finding: Diabetes (100% confidence)
	Critical Alerts: Hyperglycemia, elevated HbA1c
	Recommendations: Seek medical attention, lifestyle changes
	Actions: Physical activity, reduce carbs, weight loss
	```

	### REST API

	```bash
	# Start the unified production server
	uvicorn src.main:app --reload

	# Analyze biomarkers (structured input)
	curl -X POST http://localhost:8000/analyze/structured \
	-H "Content-Type: application/json" \
	-d '{
	"biomarkers": {"Glucose": 140, "HbA1c": 10.0}
	}'

	# Ask medical questions (RAG-powered)
	curl -X POST http://localhost:8000/ask \
	-H "Content-Type: application/json" \
	-d '{
	"question": "What does high HbA1c mean?"
	}'

	# Search knowledge base directly
	curl -X POST http://localhost:8000/search \
	-H "Content-Type: application/json" \
	-d '{
	"query": "diabetes management guidelines",
	"top_k": 5
	}'
	```

	See [docs/API.md](docs/API.md) for full API reference.

	## Project Structure

	```
	RagBot/
	├── src/ # Core application
	│ ├── __init__.py
	│ ├── workflow.py # Multi-agent orchestration (LangGraph)
	│ ├── state.py # Pydantic state models
	│ ├── biomarker_validator.py # Validation logic
	│ ├── biomarker_normalization.py # Name normalization (80+ aliases)
	│ ├── llm_config.py # LLM/embedding provider config
	│ ├── pdf_processor.py # Vector store management
	│ ├── config.py # Global configuration
	│ └── agents/ # 6 specialist agents
	│ ├── __init__.py
	│ ├── biomarker_analyzer.py
	│ ├── disease_explainer.py
	│ ├── biomarker_linker.py
	│ ├── clinical_guidelines.py
	│ ├── confidence_assessor.py
	│ └── response_synthesizer.py
	│
	├── api/ # REST API (FastAPI)
	│ ├── app/main.py # FastAPI server
	│ ├── app/routes/ # API endpoints
	│ ├── app/models/schemas.py # Pydantic request/response schemas
	│ └── app/services/ # Business logic
	│
	├── scripts/ # Utilities
	│ ├── chat.py # Interactive CLI chatbot
	│ └── setup_embeddings.py # Vector store builder
	│
	├── config/ # Configuration
	│ └── biomarker_references.json # 24 biomarker reference ranges
	│
	├── data/ # Data storage
	│ ├── medical_pdfs/ # Source documents
	│ └── vector_stores/ # FAISS database
	│
	├── tests/ # Test suite (30 tests)
	├── examples/ # Integration examples
	├── docs/ # Documentation
	│
	├── QUICKSTART.md # Setup guide
	├── CONTRIBUTING.md # Contribution guidelines
	├── requirements.txt # Python dependencies
	└── LICENSE
	```

	## Technology Stack

	\| Component \| Technology \| Purpose \|
	\|-----------\|-----------\|---------\|
	\| Orchestration \| LangGraph \| Multi-agent workflow control \|
	\| LLM \| Groq (LLaMA 3.3-70B) \| Fast, free inference \|
	\| LLM (Alt) \| Google Gemini 2.0 Flash \| Free alternative \|
	\| Embeddings \| HuggingFace / Jina / Google \| Vector representations \|
	\| Vector DB \| FAISS (local) / OpenSearch (production) \| Similarity search \|
	\| API \| FastAPI \| REST endpoints \|
	\| Web UI \| Gradio \| Interactive analysis interface \|
	\| Validation \| Pydantic V2 \| Type safety & schemas \|
	\| Cache \| Redis (optional) \| Response caching \|
	\| Observability \| Langfuse (optional) \| LLM tracing & monitoring \|

	## How It Works

	```
	User Input ("My glucose is 140...")
	│
	▼
	┌──────────────────────────────────────┐
	│ Biomarker Extraction & Normalization │ ← LLM parses text, maps 80+ aliases
	└──────────────────────────────────────┘
	│
	▼
	┌──────────────────────────────────────┐
	│ Disease Scoring (Rule-Based) │ ← Heuristic scoring, NOT ML
	└──────────────────────────────────────┘
	│
	▼
	┌──────────────────────────────────────┐
	│ RAG Knowledge Retrieval │ ← FAISS/OpenSearch vector search
	└──────────────────────────────────────┘
	│
	▼
	┌──────────────────────────────────────┐
	│ 6-Agent LangGraph Pipeline │
	│ ├─ Biomarker Analyzer (validation) │
	│ ├─ Disease Explainer (pathophysiology)│
	│ ├─ Biomarker Linker (key drivers) │
	│ ├─ Clinical Guidelines (treatment) │
	│ ├─ Confidence Assessor (reliability) │
	│ └─ Response Synthesizer (final) │
	└──────────────────────────────────────┘
	│
	▼
	┌──────────────────────────────────────┐
	│ Structured Response + Safety Alerts │
	└──────────────────────────────────────┘
	```

	## Supported Biomarkers (24)

	- Glucose Control: Glucose, HbA1c, Insulin
	- Lipids: Cholesterol, LDL Cholesterol, HDL Cholesterol, Triglycerides
	- Body Metrics: BMI
	- Blood Cells: Hemoglobin, Platelets, White Blood Cells, Red Blood Cells, Hematocrit
	- RBC Indices: Mean Corpuscular Volume, Mean Corpuscular Hemoglobin, MCHC
	- Cardiovascular: Heart Rate, Systolic Blood Pressure, Diastolic Blood Pressure, Troponin
	- Inflammation: C-reactive Protein
	- Liver: ALT, AST
	- Kidney: Creatinine

	See [config/biomarker_references.json](config/biomarker_references.json) for full reference ranges.

	## Disease Coverage

	- Diabetes
	- Anemia
	- Heart Disease
	- Thrombocytopenia
	- Thalassemia
	- (Extensible - add custom domains)

	## Privacy & Security

	- All processing runs locally after setup
	- No personal health data stored
	- Embeddings computed locally or cached
	- Vector store derived from public medical literature
	- Can operate completely offline with Ollama provider

	## Performance

	- Response Time: 15-25 seconds (6 agents + RAG retrieval)
	- Knowledge Base: 750 pages, 2,609 document chunks
	- Cost: Free (Groq/Gemini API + local/cloud embeddings)
	- Hardware: CPU-only (no GPU needed)

	## Testing

	```bash
	# Run unit tests (30 tests)
	.venv\Scripts\python.exe -m pytest tests/ -q \
	--ignore=tests/test_basic.py \
	--ignore=tests/test_diabetes_patient.py \
	--ignore=tests/test_evolution_loop.py \
	--ignore=tests/test_evolution_quick.py \
	--ignore=tests/test_evaluation_system.py

	# Run specific test file
	.venv\Scripts\python.exe -m pytest tests/test_codebase_fixes.py -v

	# Run all tests (includes integration tests requiring LLM API keys)
	.venv\Scripts\python.exe -m pytest tests/ -v
	```

	## Contributing

	Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for:
	- Code style guidelines
	- Pull request process
	- Testing requirements
	- Development setup

	## Development

	Want to extend RagBot?

	- Add custom biomarkers: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#adding-a-new-biomarker)
	- Add medical domains: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#adding-a-new-medical-domain)
	- Create custom agents: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#creating-a-custom-analysis-agent)
	- Switch LLM providers: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#switching-llm-providers)

	## License

	MIT License - See [LICENSE](LICENSE)

	## Resources

	- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
	- [Groq API Docs](https://console.groq.com)
	- [FAISS GitHub](https://github.com/facebookresearch/faiss)
	- [FastAPI Guide](https://fastapi.tiangolo.com/)

	---

	Ready to get started? -> [QUICKSTART.md](QUICKSTART.md)

	Want to understand the architecture? -> [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)

	Looking to integrate with your app? -> [examples/README.md](examples/)