# RagBot Development Guide ## For Developers & Maintainers This guide covers extending, customizing, and contributing to RagBot. ## Project Structure ``` RagBot/ ├── src/ # Core application code │ ├── __init__.py # Package marker │ ├── workflow.py # Multi-agent workflow orchestration │ ├── state.py # Pydantic data models & state │ ├── biomarker_validator.py # Biomarker validation logic │ ├── biomarker_normalization.py # Alias-to-canonical name mapping (80+ aliases) │ ├── llm_config.py # LLM & embedding configuration │ ├── pdf_processor.py # PDF loading & vector store │ ├── config.py # Global configuration │ │ │ ├── agents/ # Specialist agents │ │ ├── __init__.py # Package marker │ │ ├── biomarker_analyzer.py # Validates biomarkers │ │ ├── disease_explainer.py # Explains disease (RAG) │ │ ├── biomarker_linker.py # Links biomarkers to disease (RAG) │ │ ├── clinical_guidelines.py # Provides guidelines (RAG) │ │ ├── confidence_assessor.py # Assesses prediction confidence │ │ └── response_synthesizer.py # Synthesizes findings │ │ │ ├── evaluation/ # Evaluation framework │ │ ├── __init__.py │ │ └── evaluators.py # Quality evaluators │ │ │ └── evolution/ # Experimental components │ ├── __init__.py │ ├── director.py # Evolution orchestration │ └── pareto.py # Pareto optimization │ ├── api/ # REST API application │ ├── app/ │ │ ├── main.py # FastAPI application │ │ ├── routes/ # API endpoints │ │ │ ├── analyze.py # Main analysis endpoint │ │ │ ├── biomarkers.py # Biomarker endpoints │ │ │ └── health.py # Health check │ │ ├── models/ # Pydantic schemas │ │ └── services/ # Business logic │ ├── requirements.txt │ ├── Dockerfile │ └── docker-compose.yml │ ├── scripts/ # Utility & demo scripts │ ├── chat.py # Interactive CLI │ ├── setup_embeddings.py # Vector store builder │ ├── run_api.ps1 # API startup script │ └── ... │ ├── config/ # Configuration files │ └── biomarker_references.json # Biomarker reference ranges │ ├── data/ # Data storage │ ├── medical_pdfs/ # Source medical documents │ └── vector_stores/ # FAISS vector databases │ ├── tests/ # Test suite │ └── test_*.py │ ├── docs/ # Documentation │ ├── ARCHITECTURE.md # System design │ ├── API.md # API reference │ ├── DEVELOPMENT.md # This file │ └── ... │ ├── examples/ # Example integrations │ ├── test_website.html # Web integration example │ └── website_integration.js # JavaScript client │ ├── requirements.txt # Python dependencies ├── README.md # Main documentation ├── QUICKSTART.md # Setup guide ├── CONTRIBUTING.md # Contribution guidelines └── LICENSE ``` ## Development Setup ### 1. Clone & Install ```bash git clone https://github.com/yourusername/ragbot.git cd ragbot python -m venv .venv .venv\Scripts\activate # Windows pip install -r requirements.txt ``` ### 2. Configure ```bash cp .env.template .env # Edit .env with your API keys (Groq, Google, etc.) ``` ### 3. Rebuild Vector Store ```bash python scripts/setup_embeddings.py ``` ### 4. Run Tests ```bash pytest tests/ ``` ## Key Development Tasks ### Adding a New Biomarker **Step 1:** Update reference ranges in `config/biomarker_references.json`: ```json { "biomarkers": { "New Biomarker": { "min": 0, "max": 100, "unit": "mg/dL", "normal_range": "0-100", "critical_low": -1, "critical_high": 150, "related_conditions": ["Disease1", "Disease2"] } } } ``` **Step 2:** Add aliases in `src/biomarker_normalization.py`: ```python NORMALIZATION_MAP = { # ... existing entries ... "your alias": "New Biomarker", "other name": "New Biomarker", } ``` All consumers (CLI, API, workflow) use this shared map automatically. **Step 3:** Add validation test in `tests/test_basic.py`: ```python def test_new_biomarker(): validator = BiomarkerValidator() result = validator.validate("New Biomarker", 50) assert result.is_valid ``` **Step 4:** Medical knowledge automatically updates through RAG ### Adding a New Medical Domain **Step 1:** Collect relevant PDFs: ``` data/medical_pdfs/ your_domain.pdf your_guideline.pdf ``` **Step 2:** Rebuild vector store: ```bash python scripts/setup_embeddings.py ``` The system automatically: - Loads all PDFs from `data/medical_pdfs/` - Creates 2,609+ chunks with similarity search - Makes knowledge available to all RAG agents **Step 3:** Test with new biomarkers from that domain: ```bash python scripts/chat.py # Input: biomarkers related to your domain ``` ### Creating a Custom Analysis Agent **Example: Add a "Medication Interactions" Agent** **Step 1:** Create `src/agents/medication_checker.py`: ```python from src.llm_config import LLMConfig from src.state import PatientInput class MedicationChecker: def __init__(self): config = LLMConfig() self.llm = config.analyzer # Uses centralized LLM config def check_interactions(self, state: PatientInput) -> dict: """Check medication interactions based on biomarkers.""" # Get relevant medical knowledge # Use LLM to identify drug-drug interactions # Return structured response return { "interactions": [], "warnings": [], "recommendations": [] } ``` **Step 2:** Register in workflow (`src/workflow.py`): ```python from src.agents.medication_checker import MedicationChecker medication_agent = MedicationChecker() def check_medications(state): return medication_agent.check_interactions(state) # Add to graph graph.add_node("MedicationChecker", check_medications) graph.add_edge("ClinicalGuidelines", "MedicationChecker") graph.add_edge("MedicationChecker", "ResponseSynthesizer") ``` **Step 3:** Update synthesizer to include medication info: ```python # In response_synthesizer.py medication_info = state.get("medication_interactions", {}) ``` ### Switching LLM Providers RagBot supports three LLM providers out of the box. Set via `LLM_PROVIDER` in `.env`: | Provider | Model | Cost | Speed | |----------|-------|------|-------| | `groq` (default) | llama-3.3-70b-versatile | Free | Fast | | `gemini` | gemini-2.0-flash | Free | Medium | | `ollama` | configurable | Free (local) | Varies | ```bash # .env LLM_PROVIDER="groq" GROQ_API_KEY="gsk_..." # Or LLM_PROVIDER="gemini" GOOGLE_API_KEY="..." ``` No code changes needed — `src/llm_config.py` handles provider selection automatically. ### Modifying Embedding Provider **Current default:** Google Gemini (`models/embedding-001`, free) **Fallback:** HuggingFace sentence-transformers (local, no API key needed) **Optional:** Ollama (local) Set via `EMBEDDING_PROVIDER` in `.env`: ```bash EMBEDDING_PROVIDER="google" # Default - Google Gemini EMBEDDING_PROVIDER="huggingface" # Fallback - local EMBEDDING_PROVIDER="ollama" # Local Ollama ``` After changing, rebuild the vector store: ```bash python scripts/setup_embeddings.py ``` ⚠️ **Note:** Changing embeddings requires rebuilding the vector store (dimensions must match). ## Testing ### Run All Tests ```bash .venv\Scripts\python.exe -m pytest tests/ -q --ignore=tests/test_basic.py --ignore=tests/test_diabetes_patient.py --ignore=tests/test_evolution_loop.py --ignore=tests/test_evolution_quick.py --ignore=tests/test_evaluation_system.py ``` ### Run Specific Test ```bash .venv\Scripts\python.exe -m pytest tests/test_normalization.py -v ``` ### Test Coverage ```bash .venv\Scripts\python.exe -m pytest --cov=src tests/ ``` ### Add New Tests Create `tests/test_myfeature.py`: ```python import pytest from src.biomarker_validator import BiomarkerValidator class TestMyFeature: def setup_method(self): self.validator = BiomarkerValidator() def test_validation(self): result = self.validator.validate("Glucose", 140) assert result.is_valid == False assert result.status == "out-of-range" ``` ## Debugging ### Enable Debug Logging Set in `.env`: ``` LOG_LEVEL=DEBUG ``` ### Interactive Debugging ```bash python -c " from src.workflow import create_guild # Create the guild guild = create_guild() # Run workflow result = guild.run({ 'biomarkers': {'Glucose': 185, 'HbA1c': 8.2}, 'model_prediction': {'disease': 'Diabetes', 'confidence': 0.87} }) # Inspect result print(result) " ``` ### Profile Performance ```bash python -m cProfile -s cumtime scripts/chat.py ``` ## Code Quality ### Format Code ```bash black src/ api/ scripts/ ``` ### Check Types ```bash mypy src/ --ignore-missing-imports ``` ### Lint ```bash pylint src/ api/ scripts/ ``` ### Pre-commit Hook Create `.git/hooks/pre-commit`: ```bash #!/bin/bash black src/ api/ scripts/ pytest tests/ ``` ## Documentation - Update `docs/` when adding features - Keep README.md in sync with changes - Document all new functions with docstrings: ```python def analyze_biomarker(name: str, value: float) -> dict: """ Analyze a single biomarker value. Args: name: Biomarker name (e.g., "Glucose") value: Measured value Returns: dict: Analysis result with status, alerts, recommendations Raises: ValueError: If biomarker name is invalid """ ``` ## Performance Optimization ### Profile Agent Execution ```python import time start = time.time() result = agent.run(state) elapsed = time.time() - start print(f"Agent took {elapsed:.2f}s") ``` ### Parallel Agent Execution Agents already run in parallel via LangGraph: - Agent 1: Biomarker Analyzer - Agents 2-4: RAG agents (parallel) - Agent 5: Confidence Assessor - Agent 6: Synthesizer Modify in `src/workflow.py` if needed. ### Cache Embeddings FAISS vector store is already loaded once at startup. ### Reduce Processing Time - Fewer RAG docs: Modify `k=5` in agent prompts - Simpler LLM: Use smaller model or quantized version - Batch requests: Process multiple patients at once ## Troubleshooting ### Issue: Vector store not found ```bash .venv\Scripts\python.exe scripts/setup_embeddings.py ``` ### Issue: LLM provider not responding - Check your `.env` has valid API keys (`GROQ_API_KEY` or `GOOGLE_API_KEY`) - Verify internet connection - Check provider status pages (Groq Console, Google AI Studio) ### Issue: Slow inference - Check Groq API status - Verify internet connection - Try smaller model or batch requests ## Contributing See [CONTRIBUTING.md](../CONTRIBUTING.md) for: - Code style guidelines - Pull request process - Issue reporting - Testing requirements ## Support - Issues: GitHub Issues - Discussions: GitHub Discussions - Documentation: See `/docs` ## Resources - [LangGraph Docs](https://langchain-ai.github.io/langgraph/) - [Groq API Docs](https://console.groq.com) - [FAISS Documentation](https://github.com/facebookresearch/faiss/wiki) - [FastAPI Guide](https://fastapi.tiangolo.com/) - [Pydantic V2](https://docs.pydantic.dev/latest/)