Spaces:

T0X1N
/

Agentic-RagBot

Sleeping

App Files Files Community

Agentic-RagBot / docs /archive /QUICK_START.md

Nikhil Pravin Pise

refactor: major repository cleanup and bug fixes

6dc9d46 about 1 month ago

preview code

raw

history blame contribute delete

9.9 kB

	# MediGuard AI RAG-Helper - Quick Start Guide

	## System Status
	✓ Core System Complete - All 6 specialist agents implemented
	⚠ State Integration Needed - Minor refactoring required for end-to-end workflow

	---

	## What Works Right Now

	### ✓ Tested & Functional
	1. PDF Knowledge Base: 2,861 chunks from 750 pages of medical PDFs
	2. 4 Specialized Retrievers: disease_explainer, biomarker_linker, clinical_guidelines, general
	3. Biomarker Validator: 24 biomarkers with gender-specific reference ranges
	4. All 6 Specialist Agents: Complete implementation (1,500+ lines)
	5. Fast Embeddings: HuggingFace sentence-transformers (10-20x faster than Ollama)

	---

	## Quick Test

	### Run Core Component Test
	```powershell
	cd c:\Users\admin\OneDrive\Documents\GitHub\RagBot
	python tests\test_basic.py
	```

	Expected Output:
	```
	✓ ALL IMPORTS SUCCESSFUL
	✓ Retrieved 4 retrievers
	✓ PatientInput created
	✓ Validator working
	✓ BASIC SYSTEM TEST PASSED!
	```

	---

	## Component Breakdown

	### 1. Biomarker Validation
	```python
	from src.biomarker_validator import BiomarkerValidator

	validator = BiomarkerValidator()
	flags, alerts = validator.validate_all(
	biomarkers={"Glucose": 185, "HbA1c": 8.2},
	gender="male"
	)
	print(f"Flags: {len(flags)}, Alerts: {len(alerts)}")
	```

	### 2. RAG Retrieval
	```python
	from src.pdf_processor import get_all_retrievers

	retrievers = get_all_retrievers()
	docs = retrievers['disease_explainer'].get_relevant_documents("Type 2 Diabetes pathophysiology")
	print(f"Retrieved {len(docs)} documents")
	```

	### 3. Patient Input
	```python
	from src.state import PatientInput

	patient = PatientInput(
	biomarkers={"Glucose": 185, "HbA1c": 8.2, "Hemoglobin": 15.2},
	model_prediction={
	"disease": "Type 2 Diabetes",
	"confidence": 0.87,
	"probabilities": {"Type 2 Diabetes": 0.87, "Heart Disease": 0.08}
	},
	patient_context={"age": 52, "gender": "male", "bmi": 31.2}
	)
	```

	### 4. Individual Agent Testing
	```python
	from src.agents.biomarker_analyzer import biomarker_analyzer_agent
	from src.config import BASELINE_SOP

	# Note: Requires state integration for full testing
	# Currently agents expect patient_input object
	```

	---

	## File Locations

	### Core Components
	\| File \| Purpose \| Status \|
	\|------\|---------\|--------\|
	\| `src/biomarker_validator.py` \| 24 biomarker validation \| ✓ Complete \|
	\| `src/pdf_processor.py` \| FAISS vector stores \| ✓ Complete \|
	\| `src/llm_config.py` \| Ollama model config \| ✓ Complete \|
	\| `src/state.py` \| Data structures \| ✓ Complete \|
	\| `src/config.py` \| ExplanationSOP \| ✓ Complete \|

	### Specialist Agents (src/agents/)
	\| Agent \| Purpose \| Lines \| Status \|
	\|-------\|---------\|-------\|--------\|
	\| `biomarker_analyzer.py` \| Validate values, safety alerts \| 241 \| ✓ Complete \|
	\| `disease_explainer.py` \| RAG disease pathophysiology \| 226 \| ✓ Complete \|
	\| `biomarker_linker.py` \| Link values to prediction \| 234 \| ✓ Complete \|
	\| `clinical_guidelines.py` \| RAG recommendations \| 258 \| ✓ Complete \|
	\| `confidence_assessor.py` \| Evaluate reliability \| 291 \| ✓ Complete \|
	\| `response_synthesizer.py` \| Compile final output \| 300 \| ✓ Complete \|

	### Workflow
	\| File \| Purpose \| Status \|
	\|------\|---------\|--------\|
	\| `src/workflow.py` \| LangGraph orchestration \| ⚠ Needs state integration \|

	### Data
	\| Directory \| Contents \| Status \|
	\|-----------\|----------\|--------\|
	\| `data/medical_pdfs/` \| 8 medical guideline PDFs \| ✓ Complete \|
	\| `data/vector_stores/` \| FAISS indices (2,861 chunks) \| ✓ Complete \|

	---

	## Architecture

	```
	┌─────────────────────────────────────────┐
	│ Patient Input │
	│ (biomarkers + ML prediction) │
	└──────────────┬──────────────────────────┘
	│
	↓
	┌─────────────────────────────────────────┐
	│ Agent 1: Biomarker Analyzer │
	│ • Validates 24 biomarkers │
	│ • Generates safety alerts │
	│ • Identifies disease-relevant values │
	└──────────────┬──────────────────────────┘
	│
	┌────────┼────────┐
	↓ ↓ ↓
	┌──────────┬──────────┬──────────┐
	│ Agent 2 │ Agent 3 │ Agent 4 │
	│ Disease │Biomarker │ Clinical │
	│Explainer │ Linker │Guidelines│
	│ (RAG) │ (RAG) │ (RAG) │
	└──────────┴──────────┴──────────┘
	│ │ │
	└────────┼────────┘
	↓
	┌─────────────────────────────────────────┐
	│ Agent 5: Confidence Assessor │
	│ • Evaluates evidence strength │
	│ • Identifies limitations │
	│ • Calculates reliability score │
	└──────────────┬──────────────────────────┘
	│
	↓
	┌─────────────────────────────────────────┐
	│ Agent 6: Response Synthesizer │
	│ • Compiles all findings │
	│ • Generates patient-friendly narrative │
	│ • Structures final JSON output │
	└──────────────┬──────────────────────────┘
	│
	↓
	┌─────────────────────────────────────────┐
	│ Structured JSON Response │
	│ • Patient summary │
	│ • Prediction explanation │
	│ • Clinical recommendations │
	│ • Confidence assessment │
	│ • Safety alerts │
	└─────────────────────────────────────────┘
	```

	---

	## Next Steps for Full Integration

	### 1. State Refactoring (1-2 hours)
	Update all 6 agents to use GuildState structure:

	Current (in agents):
	```python
	patient_input = state['patient_input']
	biomarkers = patient_input.biomarkers
	disease = patient_input.model_prediction['disease']
	```

	Target (needs update):
	```python
	biomarkers = state['patient_biomarkers']
	disease = state['model_prediction']['disease']
	patient_context = state.get('patient_context', {})
	```

	Files to update:
	- `src/agents/biomarker_analyzer.py` (~5 lines)
	- `src/agents/disease_explainer.py` (~3 lines)
	- `src/agents/biomarker_linker.py` (~4 lines)
	- `src/agents/clinical_guidelines.py` (~3 lines)
	- `src/agents/confidence_assessor.py` (~4 lines)
	- `src/agents/response_synthesizer.py` (~8 lines)

	### 2. Workflow Testing (30 min)
	```powershell
	python tests\test_diabetes_patient.py
	```

	### 3. Multi-Disease Testing (30 min)
	Create test cases for:
	- Anemia patient
	- Heart disease patient
	- Thrombocytopenia patient
	- Thalassemia patient

	---

	## Models Required

	### Ollama LLMs (Local)
	```powershell
	ollama pull llama3.1:8b
	ollama pull qwen2:7b
	ollama pull nomic-embed-text
	```

	### HuggingFace Embeddings (Automatic Download)
	- `sentence-transformers/all-MiniLM-L6-v2`
	- Downloads automatically on first run
	- ~90 MB model size

	---

	## Performance

	### Current Benchmarks
	- Vector Store Creation: ~3 minutes (2,861 chunks)
	- Retrieval: <1 second (k=5 chunks)
	- Biomarker Validation: ~1-2 seconds
	- Individual Agent: ~3-10 seconds
	- Estimated Full Workflow: ~20-30 seconds

	### Optimization Achieved
	- Before: Ollama embeddings (30+ minutes)
	- After: HuggingFace embeddings (~3 minutes)
	- Speedup: 10-20x improvement

	---

	## Troubleshooting

	### Issue: "Cannot import get_all_retrievers"
	Solution: Vector store not created yet
	```powershell
	python src\pdf_processor.py
	```

	### Issue: "Ollama model not found"
	Solution: Pull missing models
	```powershell
	ollama pull llama3.1:8b
	ollama pull qwen2:7b
	```

	### Issue: "No PDF files found"
	Solution: Add medical PDFs to `data/medical_pdfs/`

	---

	## Key Features Implemented

	✓ 24 biomarker validation with gender-specific ranges
	✓ Safety alert system for critical values
	✓ RAG-based disease explanation (2,861 chunks)
	✓ Evidence-based recommendations with citations
	✓ Confidence assessment with reliability scoring
	✓ Patient-friendly narrative generation
	✓ Fast local embeddings (10-20x speedup)
	✓ Multi-agent parallel execution architecture
	✓ Evolvable SOPs for hyperparameter tuning
	✓ Type-safe state management with Pydantic

	---

	## Resources

	### Documentation
	- Implementation Summary: `IMPLEMENTATION_SUMMARY.md`
	- Project Context: `project_context.md`
	- README: `README.md`

	### Code References
	- Clinical Trials Architect: `code.ipynb`
	- Test Cases: `tests/test_basic.py`, `tests/test_diabetes_patient.py`

	### External Links
	- LangChain: https://python.langchain.com/
	- LangGraph: https://python.langchain.com/docs/langgraph
	- Ollama: https://ollama.ai/
	- FAISS: https://github.com/facebookresearch/faiss

	---

	Current Status: 95% Complete ✓
	Next Step: State integration refactoring
	Estimated Time to Completion: 2-3 hours