Agentic-RagBot / README.md
Nikhil Pravin Pise
refactor: major repository cleanup and bug fixes
6dc9d46
|
raw
history blame
8.36 kB

RagBot: Multi-Agent RAG System for Medical Biomarker Analysis

A production-ready biomarker analysis system combining 6 specialized AI agents with medical knowledge retrieval to provide evidence-based insights on blood test results in 15-25 seconds.

✨ Key Features

  • 6 Specialist Agents - Biomarker validation, disease prediction, RAG-powered analysis, confidence assessment
  • Medical Knowledge Base - 750+ pages of clinical guidelines (FAISS vector store, local embeddings)
  • Multiple Interfaces - Interactive CLI chat, REST API, ready for web/mobile integration
  • Evidence-Based - All recommendations backed by retrieved medical literature
  • Free & Offline - Uses free Groq API + local embeddings (no embedding API costs)
  • Production-Ready - Full error handling, safety alerts, confidence scoring

πŸš€ Quick Start

Installation (5 minutes):

# Clone & setup
git clone https://github.com/yourusername/ragbot.git
cd ragbot
python -m venv .venv
.venv\Scripts\activate  # Windows
pip install -r requirements.txt

# Get free API key
# 1. Sign up: https://console.groq.com/keys
# 2. Copy API key to .env

# Run setup
python scripts/setup_embeddings.py

# Start chatting
python scripts/chat.py

See QUICKSTART.md for detailed setup instructions.

πŸ“š Documentation

Document Purpose
QUICKSTART.md 5-minute setup guide
CONTRIBUTING.md How to contribute
docs/ARCHITECTURE.md System design & components
docs/API.md REST API reference
docs/DEVELOPMENT.md Development & extension guide
scripts/README.md Utility scripts reference
examples/README.md Web/mobile integration examples

πŸ’» Usage

Interactive CLI

python scripts/chat.py

You: My glucose is 140 and HbA1c is 10

πŸ”΄ Primary Finding: Diabetes (85% confidence)
⚠️ Critical Alerts: Hyperglycemia, elevated HbA1c
βœ… Recommendations: Seek medical attention, lifestyle changes
🌱 Actions: Physical activity, reduce carbs, weight loss

REST API

# Start server
python -m uvicorn api.app.main:app

# POST /api/v1/analyze
curl -X POST http://localhost:8000/api/v1/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "biomarkers": {"Glucose": 140, "HbA1c": 10.0}
  }'

See docs/API.md for full API reference.

πŸ—οΈ Project Structure

RagBot/
β”œβ”€β”€ src/                           # Core application
β”‚   β”œβ”€β”€ workflow.py               # Multi-agent orchestration (LangGraph)
β”‚   β”œβ”€β”€ biomarker_validator.py    # Validation logic
β”‚   β”œβ”€β”€ pdf_processor.py          # Vector store management
β”‚   └── agents/                   # 6 specialist agents
β”‚
β”œβ”€β”€ api/                          # REST API (optional)
β”‚   β”œβ”€β”€ app/main.py              # FastAPI server
β”‚   └── app/routes/              # API endpoints
β”‚
β”œβ”€β”€ scripts/                      # Utilities
β”‚   β”œβ”€β”€ chat.py                  # Interactive CLI
β”‚   └── setup_embeddings.py      # Vector store builder
β”‚
β”œβ”€β”€ config/                       # Configuration
β”‚   └── biomarker_references.json # Reference ranges
β”‚
β”œβ”€β”€ data/                         # Data storage
β”‚   β”œβ”€β”€ medical_pdfs/            # Source documents
β”‚   └── vector_stores/           # FAISS database
β”‚
β”œβ”€β”€ tests/                        # Test suite
β”œβ”€β”€ examples/                     # Integration examples
β”œβ”€β”€ docs/                         # Documentation
β”‚   β”œβ”€β”€ ARCHITECTURE.md          # System design
β”‚   β”œβ”€β”€ API.md                   # API reference
β”‚   β”œβ”€β”€ DEVELOPMENT.md           # Development guide
β”‚   β”œβ”€β”€ archive/                 # Old docs
β”‚   └── plans/                   # Planning docs
β”‚
β”œβ”€β”€ QUICKSTART.md               # Setup guide
β”œβ”€β”€ CONTRIBUTING.md             # Contribution guidelines
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ .env.template              # Configuration template
└── LICENSE

πŸ”§ Technology Stack

Component Technology Purpose
Orchestration LangGraph Multi-agent workflow control
LLM Groq (LLaMA 3.3-70B) Fast, free inference
Embeddings HuggingFace (sentence-transformers) Local, offline embeddings
Vector DB FAISS Efficient similarity search
API FastAPI REST endpoints
Data Pydantic V2 Type validation

πŸ” How It Works

User Input ("My glucose is 140...")
    ↓
[Biomarker Extraction] β†’ Parse & normalize
    ↓
[Prediction Agent] β†’ Disease hypothesis
    ↓
[RAG Retrieval] β†’ Get medical docs from vector store
    ↓
[6 Parallel Agents] β†’ Analyze from different angles
    β”œβ”€ Biomarker Analyzer (validation)
    β”œβ”€ Disease Explainer (RAG)
    β”œβ”€ Biomarker-Disease Linker (RAG)
    β”œβ”€ Clinical Guidelines (RAG)
    β”œβ”€ Confidence Assessor (scoring)
    └─ Response Synthesizer (summary)
    ↓
[Output] β†’ Comprehensive report with safety alerts

πŸ“Š Supported Biomarkers

24+ biomarkers including:

  • Glucose Control: Glucose, HbA1c, Fasting Glucose
  • Lipids: Total Cholesterol, LDL, HDL, Triglycerides
  • Cardiac: Troponin, BNP, CK-MB
  • Blood Cells: WBC, RBC, Hemoglobin, Hematocrit, Platelets
  • Liver: ALT, AST, Albumin, Bilirubin
  • Kidney: Creatinine, BUN, eGFR
  • And more...

See config/biomarker_references.json for complete list.

🎯 Disease Coverage

  • Diabetes
  • Anemia
  • Heart Disease
  • Thrombocytopenia
  • Thalassemia
  • (Extensible - add custom domains)

πŸ”’ Privacy & Security

  • All processing runs locally after setup
  • No personal health data sent to APIs (except LLM inference)
  • Embeddings computed locally or cached
  • Fully HIPAA-compliant architecture ready
  • Vector store derived from public medical literature
  • Can operate completely offline after initial setup

πŸ“ˆ Performance

  • Response Time: 15-25 seconds (8 agents + RAG retrieval)
  • Knowledge Base: 750 pages β†’ 2,609 document chunks
  • Embedding Dimensions: 384
  • Cost: Free (Groq API + local embeddings)
  • Hardware: CPU-only (no GPU needed)

πŸš€ Deployment Options

  1. CLI - Interactive chatbot (development/testing)
  2. REST API - FastAPI server (production)
  3. Docker - Containerized deployment
  4. Embedded - Direct Python library import
  5. Web - JavaScript/React integration
  6. Mobile - React Native / Flutter

See examples/README.md for integration patterns.

πŸ§ͺ Testing

# Run all tests
pytest tests/ -v

# Test specific module
pytest tests/test_diabetes_patient.py -v

# Coverage report
pytest --cov=src tests/

🀝 Contributing

Contributions welcome! See CONTRIBUTING.md for:

  • Code style guidelines
  • Pull request process
  • Testing requirements
  • Development setup

πŸ“– Development

Want to extend RagBot?

πŸ“‹ License

MIT License - See LICENSE

πŸ™‹ Support

  • Issues: GitHub Issues for bugs and feature requests
  • Discussion: GitHub Discussions for questions
  • Docs: Full documentation in /docs folder

πŸ”— Resources


Ready to get started? β†’ QUICKSTART.md

Want to understand the architecture? β†’ docs/ARCHITECTURE.md

Looking to integrate with your app? β†’ examples/README.md