Spaces:

T0X1N
/

Agentic-RagBot

Sleeping

App Files Files Community

Agentic-RagBot / README.md

Nikhil Pravin Pise

Deploy to HuggingFace Spaces - Medical RAG with vector store

9699bea about 1 month ago

preview code

raw

history blame

9.3 kB

	---
	title: Agentic RagBot
	emoji: 🏥
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	pinned: true
	license: mit
	app_port: 7860
	tags:
	- medical
	- biomarker
	- rag
	- healthcare
	- langgraph
	- agents
	short_description: Multi-Agent RAG System for Medical Biomarker Analysis
	---

	# RagBot: Multi-Agent RAG System for Medical Biomarker Analysis

	A production-ready biomarker analysis system combining 6 specialized AI agents with medical knowledge retrieval to provide evidence-based insights on blood test results in 15-25 seconds.

	## Key Features

	- 6 Specialist Agents - Biomarker validation, disease prediction, RAG-powered analysis, confidence assessment
	- Medical Knowledge Base - 750+ pages of clinical guidelines (FAISS vector store)
	- Multiple Interfaces - Interactive CLI chat, REST API, ready for web/mobile integration
	- Evidence-Based - All recommendations backed by retrieved medical literature
	- Free Cloud LLMs - Uses Groq (LLaMA 3.3-70B) or Google Gemini - no cost
	- Biomarker Normalization - 80+ aliases mapped to 24 canonical biomarker names
	- Production-Ready - Full error handling, safety alerts, confidence scoring, 30 unit tests

	## Quick Start

	Installation (5 minutes):

	```bash
	# Clone & setup
	git clone https://github.com/yourusername/ragbot.git
	cd ragbot
	python -m venv .venv
	.venv\Scripts\activate # Windows
	pip install -r requirements.txt

	# Get free API key
	# 1. Sign up: https://console.groq.com/keys
	# 2. Copy API key to .env

	# Run setup
	python scripts/setup_embeddings.py

	# Start chatting
	python scripts/chat.py
	```

	See [QUICKSTART.md](QUICKSTART.md) for detailed setup instructions.

	## Documentation

	\| Document \| Purpose \|
	\|----------\|---------\|
	\| [QUICKSTART.md](QUICKSTART.md) \| 5-minute setup guide \|
	\| [CONTRIBUTING.md](CONTRIBUTING.md) \| How to contribute \|
	\| [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) \| System design & components \|
	\| [docs/API.md](docs/API.md) \| REST API reference \|
	\| [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md) \| Development & extension guide \|
	\| [scripts/README.md](scripts/README.md) \| Utility scripts reference \|
	\| [examples/README.md](examples/) \| Web/mobile integration examples \|

	## Usage

	### Interactive CLI

	```bash
	python scripts/chat.py

	You: My glucose is 140 and HbA1c is 10

	Primary Finding: Diabetes (100% confidence)
	Critical Alerts: Hyperglycemia, elevated HbA1c
	Recommendations: Seek medical attention, lifestyle changes
	Actions: Physical activity, reduce carbs, weight loss
	```

	### REST API

	```bash
	# Start server
	cd api
	python -m uvicorn app.main:app

	# Analyze biomarkers (structured input)
	curl -X POST http://localhost:8000/api/v1/analyze/structured \
	-H "Content-Type: application/json" \
	-d '{
	"biomarkers": {"Glucose": 140, "HbA1c": 10.0}
	}'

	# Analyze biomarkers (natural language)
	curl -X POST http://localhost:8000/api/v1/analyze/natural \
	-H "Content-Type: application/json" \
	-d '{
	"message": "My glucose is 140 and HbA1c is 10"
	}'
	```

	See [docs/API.md](docs/API.md) for full API reference.

	## Project Structure

	```
	RagBot/
	├── src/ # Core application
	│ ├── __init__.py
	│ ├── workflow.py # Multi-agent orchestration (LangGraph)
	│ ├── state.py # Pydantic state models
	│ ├── biomarker_validator.py # Validation logic
	│ ├── biomarker_normalization.py # Name normalization (80+ aliases)
	│ ├── llm_config.py # LLM/embedding provider config
	│ ├── pdf_processor.py # Vector store management
	│ ├── config.py # Global configuration
	│ └── agents/ # 6 specialist agents
	│ ├── __init__.py
	│ ├── biomarker_analyzer.py
	│ ├── disease_explainer.py
	│ ├── biomarker_linker.py
	│ ├── clinical_guidelines.py
	│ ├── confidence_assessor.py
	│ └── response_synthesizer.py
	│
	├── api/ # REST API (FastAPI)
	│ ├── app/main.py # FastAPI server
	│ ├── app/routes/ # API endpoints
	│ ├── app/models/schemas.py # Pydantic request/response schemas
	│ └── app/services/ # Business logic
	│
	├── scripts/ # Utilities
	│ ├── chat.py # Interactive CLI chatbot
	│ └── setup_embeddings.py # Vector store builder
	│
	├── config/ # Configuration
	│ └── biomarker_references.json # 24 biomarker reference ranges
	│
	├── data/ # Data storage
	│ ├── medical_pdfs/ # Source documents
	│ └── vector_stores/ # FAISS database
	│
	├── tests/ # Test suite (30 tests)
	├── examples/ # Integration examples
	├── docs/ # Documentation
	│
	├── QUICKSTART.md # Setup guide
	├── CONTRIBUTING.md # Contribution guidelines
	├── requirements.txt # Python dependencies
	└── LICENSE
	```

	## Technology Stack

	\| Component \| Technology \| Purpose \|
	\|-----------\|-----------\|---------\|
	\| Orchestration \| LangGraph \| Multi-agent workflow control \|
	\| LLM \| Groq (LLaMA 3.3-70B) \| Fast, free inference \|
	\| LLM (Alt) \| Google Gemini 2.0 Flash \| Free alternative \|
	\| Embeddings \| Google Gemini / HuggingFace \| Vector representations \|
	\| Vector DB \| FAISS \| Efficient similarity search \|
	\| API \| FastAPI \| REST endpoints \|
	\| Validation \| Pydantic V2 \| Type safety & schemas \|

	## How It Works

	```
	User Input ("My glucose is 140...")
	\|
	[Biomarker Extraction] -> Parse & normalize (80+ aliases)
	\|
	[Disease Prediction] -> Rule-based + LLM hypothesis
	\|
	[RAG Retrieval] -> Get medical docs from FAISS vector store
	\|
	[6 Agent Pipeline via LangGraph]
	\|-- Biomarker Analyzer (validation + safety alerts)
	\|-- Disease Explainer (RAG pathophysiology)
	\|-- Biomarker-Disease Linker (RAG key drivers)
	\|-- Clinical Guidelines (RAG recommendations)
	\|-- Confidence Assessor (reliability scoring)
	+-- Response Synthesizer (final structured report)
	\|
	[Output] -> Comprehensive report with safety alerts
	```

	## Supported Biomarkers (24)

	- Glucose Control: Glucose, HbA1c, Insulin
	- Lipids: Cholesterol, LDL Cholesterol, HDL Cholesterol, Triglycerides
	- Body Metrics: BMI
	- Blood Cells: Hemoglobin, Platelets, White Blood Cells, Red Blood Cells, Hematocrit
	- RBC Indices: Mean Corpuscular Volume, Mean Corpuscular Hemoglobin, MCHC
	- Cardiovascular: Heart Rate, Systolic Blood Pressure, Diastolic Blood Pressure, Troponin
	- Inflammation: C-reactive Protein
	- Liver: ALT, AST
	- Kidney: Creatinine

	See [config/biomarker_references.json](config/biomarker_references.json) for full reference ranges.

	## Disease Coverage

	- Diabetes
	- Anemia
	- Heart Disease
	- Thrombocytopenia
	- Thalassemia
	- (Extensible - add custom domains)

	## Privacy & Security

	- All processing runs locally after setup
	- No personal health data stored
	- Embeddings computed locally or cached
	- Vector store derived from public medical literature
	- Can operate completely offline with Ollama provider

	## Performance

	- Response Time: 15-25 seconds (6 agents + RAG retrieval)
	- Knowledge Base: 750 pages, 2,609 document chunks
	- Cost: Free (Groq/Gemini API + local/cloud embeddings)
	- Hardware: CPU-only (no GPU needed)

	## Testing

	```bash
	# Run unit tests (30 tests)
	.venv\Scripts\python.exe -m pytest tests/ -q \
	--ignore=tests/test_basic.py \
	--ignore=tests/test_diabetes_patient.py \
	--ignore=tests/test_evolution_loop.py \
	--ignore=tests/test_evolution_quick.py \
	--ignore=tests/test_evaluation_system.py

	# Run specific test file
	.venv\Scripts\python.exe -m pytest tests/test_codebase_fixes.py -v

	# Run all tests (includes integration tests requiring LLM API keys)
	.venv\Scripts\python.exe -m pytest tests/ -v
	```

	## Contributing

	Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for:
	- Code style guidelines
	- Pull request process
	- Testing requirements
	- Development setup

	## Development

	Want to extend RagBot?

	- Add custom biomarkers: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#adding-a-new-biomarker)
	- Add medical domains: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#adding-a-new-medical-domain)
	- Create custom agents: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#creating-a-custom-analysis-agent)
	- Switch LLM providers: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md#switching-llm-providers)

	## License

	MIT License - See [LICENSE](LICENSE)

	## Resources

	- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
	- [Groq API Docs](https://console.groq.com)
	- [FAISS GitHub](https://github.com/facebookresearch/faiss)
	- [FastAPI Guide](https://fastapi.tiangolo.com/)

	---

	Ready to get started? -> [QUICKSTART.md](QUICKSTART.md)

	Want to understand the architecture? -> [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)

	Looking to integrate with your app? -> [examples/README.md](examples/)