Spaces:
Running
Running
| # RagBot Development Guide | |
| ## For Developers & Maintainers | |
| This guide covers extending, customizing, and contributing to RagBot. | |
| ## Project Structure | |
| ``` | |
| RagBot/ | |
| βββ src/ # Core application code | |
| β βββ __init__.py # Package marker | |
| β βββ workflow.py # Multi-agent workflow orchestration | |
| β βββ state.py # Pydantic data models & state | |
| β βββ biomarker_validator.py # Biomarker validation logic | |
| β βββ biomarker_normalization.py # Alias-to-canonical name mapping (80+ aliases) | |
| β βββ llm_config.py # LLM & embedding configuration | |
| β βββ pdf_processor.py # PDF loading & vector store | |
| β βββ config.py # Global configuration | |
| β β | |
| β βββ agents/ # Specialist agents | |
| β β βββ __init__.py # Package marker | |
| β β βββ biomarker_analyzer.py # Validates biomarkers | |
| β β βββ disease_explainer.py # Explains disease (RAG) | |
| β β βββ biomarker_linker.py # Links biomarkers to disease (RAG) | |
| β β βββ clinical_guidelines.py # Provides guidelines (RAG) | |
| β β βββ confidence_assessor.py # Assesses prediction confidence | |
| β β βββ response_synthesizer.py # Synthesizes findings | |
| β β | |
| β βββ evaluation/ # Evaluation framework | |
| β β βββ __init__.py | |
| β β βββ evaluators.py # Quality evaluators | |
| β β | |
| β βββ evolution/ # Experimental components | |
| β βββ __init__.py | |
| β βββ director.py # Evolution orchestration | |
| β βββ pareto.py # Pareto optimization | |
| β | |
| βββ api/ # REST API application | |
| β βββ app/ | |
| β β βββ main.py # FastAPI application | |
| β β βββ routes/ # API endpoints | |
| β β β βββ analyze.py # Main analysis endpoint | |
| β β β βββ biomarkers.py # Biomarker endpoints | |
| β β β βββ health.py # Health check | |
| β β βββ models/ # Pydantic schemas | |
| β β βββ services/ # Business logic | |
| β βββ requirements.txt | |
| β βββ Dockerfile | |
| β βββ docker-compose.yml | |
| β | |
| βββ scripts/ # Utility & demo scripts | |
| β βββ chat.py # Interactive CLI | |
| β βββ setup_embeddings.py # Vector store builder | |
| β βββ run_api.ps1 # API startup script | |
| β βββ ... | |
| β | |
| βββ config/ # Configuration files | |
| β βββ biomarker_references.json # Biomarker reference ranges | |
| β | |
| βββ data/ # Data storage | |
| β βββ medical_pdfs/ # Source medical documents | |
| β βββ vector_stores/ # FAISS vector databases | |
| β | |
| βββ tests/ # Test suite | |
| β βββ test_*.py | |
| β | |
| βββ docs/ # Documentation | |
| β βββ ARCHITECTURE.md # System design | |
| β βββ API.md # API reference | |
| β βββ DEVELOPMENT.md # This file | |
| β βββ ... | |
| β | |
| βββ examples/ # Example integrations | |
| β βββ test_website.html # Web integration example | |
| β βββ website_integration.js # JavaScript client | |
| β | |
| βββ requirements.txt # Python dependencies | |
| βββ README.md # Main documentation | |
| βββ QUICKSTART.md # Setup guide | |
| βββ CONTRIBUTING.md # Contribution guidelines | |
| βββ LICENSE | |
| ``` | |
| ## Development Setup | |
| ### 1. Clone & Install | |
| ```bash | |
| git clone https://github.com/yourusername/ragbot.git | |
| cd ragbot | |
| python -m venv .venv | |
| .venv\Scripts\activate # Windows | |
| pip install -r requirements.txt | |
| ``` | |
| ### 2. Configure | |
| ```bash | |
| cp .env.template .env | |
| # Edit .env with your API keys (Groq, Google, etc.) | |
| ``` | |
| ### 3. Rebuild Vector Store | |
| ```bash | |
| python scripts/setup_embeddings.py | |
| ``` | |
| ### 4. Run Tests | |
| ```bash | |
| pytest tests/ | |
| ``` | |
| ## Key Development Tasks | |
| ### Adding a New Biomarker | |
| **Step 1:** Update reference ranges in `config/biomarker_references.json`: | |
| ```json | |
| { | |
| "biomarkers": { | |
| "New Biomarker": { | |
| "min": 0, | |
| "max": 100, | |
| "unit": "mg/dL", | |
| "normal_range": "0-100", | |
| "critical_low": -1, | |
| "critical_high": 150, | |
| "related_conditions": ["Disease1", "Disease2"] | |
| } | |
| } | |
| } | |
| ``` | |
| **Step 2:** Add aliases in `src/biomarker_normalization.py`: | |
| ```python | |
| NORMALIZATION_MAP = { | |
| # ... existing entries ... | |
| "your alias": "New Biomarker", | |
| "other name": "New Biomarker", | |
| } | |
| ``` | |
| All consumers (CLI, API, workflow) use this shared map automatically. | |
| **Step 3:** Add validation test in `tests/test_basic.py`: | |
| ```python | |
| def test_new_biomarker(): | |
| validator = BiomarkerValidator() | |
| result = validator.validate("New Biomarker", 50) | |
| assert result.is_valid | |
| ``` | |
| **Step 4:** Medical knowledge automatically updates through RAG | |
| ### Adding a New Medical Domain | |
| **Step 1:** Collect relevant PDFs: | |
| ``` | |
| data/medical_pdfs/ | |
| your_domain.pdf | |
| your_guideline.pdf | |
| ``` | |
| **Step 2:** Rebuild vector store: | |
| ```bash | |
| python scripts/setup_embeddings.py | |
| ``` | |
| The system automatically: | |
| - Loads all PDFs from `data/medical_pdfs/` | |
| - Creates 2,609+ chunks with similarity search | |
| - Makes knowledge available to all RAG agents | |
| **Step 3:** Test with new biomarkers from that domain: | |
| ```bash | |
| python scripts/chat.py | |
| # Input: biomarkers related to your domain | |
| ``` | |
| ### Creating a Custom Analysis Agent | |
| **Example: Add a "Medication Interactions" Agent** | |
| **Step 1:** Create `src/agents/medication_checker.py`: | |
| ```python | |
| from src.llm_config import LLMConfig | |
| from src.state import PatientInput | |
| class MedicationChecker: | |
| def __init__(self): | |
| config = LLMConfig() | |
| self.llm = config.analyzer # Uses centralized LLM config | |
| def check_interactions(self, state: PatientInput) -> dict: | |
| """Check medication interactions based on biomarkers.""" | |
| # Get relevant medical knowledge | |
| # Use LLM to identify drug-drug interactions | |
| # Return structured response | |
| return { | |
| "interactions": [], | |
| "warnings": [], | |
| "recommendations": [] | |
| } | |
| ``` | |
| **Step 2:** Register in workflow (`src/workflow.py`): | |
| ```python | |
| from src.agents.medication_checker import MedicationChecker | |
| medication_agent = MedicationChecker() | |
| def check_medications(state): | |
| return medication_agent.check_interactions(state) | |
| # Add to graph | |
| graph.add_node("MedicationChecker", check_medications) | |
| graph.add_edge("ClinicalGuidelines", "MedicationChecker") | |
| graph.add_edge("MedicationChecker", "ResponseSynthesizer") | |
| ``` | |
| **Step 3:** Update synthesizer to include medication info: | |
| ```python | |
| # In response_synthesizer.py | |
| medication_info = state.get("medication_interactions", {}) | |
| ``` | |
| ### Switching LLM Providers | |
| RagBot supports three LLM providers out of the box. Set via `LLM_PROVIDER` in `.env`: | |
| | Provider | Model | Cost | Speed | | |
| |----------|-------|------|-------| | |
| | `groq` (default) | llama-3.3-70b-versatile | Free | Fast | | |
| | `gemini` | gemini-2.0-flash | Free | Medium | | |
| | `ollama` | configurable | Free (local) | Varies | | |
| ```bash | |
| # .env | |
| LLM_PROVIDER="groq" | |
| GROQ_API_KEY="gsk_..." | |
| # Or | |
| LLM_PROVIDER="gemini" | |
| GOOGLE_API_KEY="..." | |
| ``` | |
| No code changes needed β `src/llm_config.py` handles provider selection automatically. | |
| ### Modifying Embedding Provider | |
| **Current default:** Google Gemini (`models/embedding-001`, free) | |
| **Fallback:** HuggingFace sentence-transformers (local, no API key needed) | |
| **Optional:** Ollama (local) | |
| Set via `EMBEDDING_PROVIDER` in `.env`: | |
| ```bash | |
| EMBEDDING_PROVIDER="google" # Default - Google Gemini | |
| EMBEDDING_PROVIDER="huggingface" # Fallback - local | |
| EMBEDDING_PROVIDER="ollama" # Local Ollama | |
| ``` | |
| After changing, rebuild the vector store: | |
| ```bash | |
| python scripts/setup_embeddings.py | |
| ``` | |
| β οΈ **Note:** Changing embeddings requires rebuilding the vector store (dimensions must match). | |
| ## Testing | |
| ### Run All Tests | |
| ```bash | |
| .venv\Scripts\python.exe -m pytest tests/ -q --ignore=tests/test_basic.py --ignore=tests/test_diabetes_patient.py --ignore=tests/test_evolution_loop.py --ignore=tests/test_evolution_quick.py --ignore=tests/test_evaluation_system.py | |
| ``` | |
| ### Run Specific Test | |
| ```bash | |
| .venv\Scripts\python.exe -m pytest tests/test_normalization.py -v | |
| ``` | |
| ### Test Coverage | |
| ```bash | |
| .venv\Scripts\python.exe -m pytest --cov=src tests/ | |
| ``` | |
| ### Add New Tests | |
| Create `tests/test_myfeature.py`: | |
| ```python | |
| import pytest | |
| from src.biomarker_validator import BiomarkerValidator | |
| class TestMyFeature: | |
| def setup_method(self): | |
| self.validator = BiomarkerValidator() | |
| def test_validation(self): | |
| result = self.validator.validate("Glucose", 140) | |
| assert result.is_valid == False | |
| assert result.status == "out-of-range" | |
| ``` | |
| ## Debugging | |
| ### Enable Debug Logging | |
| Set in `.env`: | |
| ``` | |
| LOG_LEVEL=DEBUG | |
| ``` | |
| ### Interactive Debugging | |
| ```bash | |
| python -c " | |
| from src.workflow import create_guild | |
| # Create the guild | |
| guild = create_guild() | |
| # Run workflow | |
| result = guild.run({ | |
| 'biomarkers': {'Glucose': 185, 'HbA1c': 8.2}, | |
| 'model_prediction': {'disease': 'Diabetes', 'confidence': 0.87} | |
| }) | |
| # Inspect result | |
| print(result) | |
| " | |
| ``` | |
| ### Profile Performance | |
| ```bash | |
| python -m cProfile -s cumtime scripts/chat.py | |
| ``` | |
| ## Code Quality | |
| ### Format Code | |
| ```bash | |
| black src/ api/ scripts/ | |
| ``` | |
| ### Check Types | |
| ```bash | |
| mypy src/ --ignore-missing-imports | |
| ``` | |
| ### Lint | |
| ```bash | |
| pylint src/ api/ scripts/ | |
| ``` | |
| ### Pre-commit Hook | |
| Create `.git/hooks/pre-commit`: | |
| ```bash | |
| #!/bin/bash | |
| black src/ api/ scripts/ | |
| pytest tests/ | |
| ``` | |
| ## Documentation | |
| - Update `docs/` when adding features | |
| - Keep README.md in sync with changes | |
| - Document all new functions with docstrings: | |
| ```python | |
| def analyze_biomarker(name: str, value: float) -> dict: | |
| """ | |
| Analyze a single biomarker value. | |
| Args: | |
| name: Biomarker name (e.g., "Glucose") | |
| value: Measured value | |
| Returns: | |
| dict: Analysis result with status, alerts, recommendations | |
| Raises: | |
| ValueError: If biomarker name is invalid | |
| """ | |
| ``` | |
| ## Performance Optimization | |
| ### Profile Agent Execution | |
| ```python | |
| import time | |
| start = time.time() | |
| result = agent.run(state) | |
| elapsed = time.time() - start | |
| print(f"Agent took {elapsed:.2f}s") | |
| ``` | |
| ### Parallel Agent Execution | |
| Agents already run in parallel via LangGraph: | |
| - Agent 1: Biomarker Analyzer | |
| - Agents 2-4: RAG agents (parallel) | |
| - Agent 5: Confidence Assessor | |
| - Agent 6: Synthesizer | |
| Modify in `src/workflow.py` if needed. | |
| ### Cache Embeddings | |
| FAISS vector store is already loaded once at startup. | |
| ### Reduce Processing Time | |
| - Fewer RAG docs: Modify `k=5` in agent prompts | |
| - Simpler LLM: Use smaller model or quantized version | |
| - Batch requests: Process multiple patients at once | |
| ## Troubleshooting | |
| ### Issue: Vector store not found | |
| ```bash | |
| .venv\Scripts\python.exe scripts/setup_embeddings.py | |
| ``` | |
| ### Issue: LLM provider not responding | |
| - Check your `.env` has valid API keys (`GROQ_API_KEY` or `GOOGLE_API_KEY`) | |
| - Verify internet connection | |
| - Check provider status pages (Groq Console, Google AI Studio) | |
| ### Issue: Slow inference | |
| - Check Groq API status | |
| - Verify internet connection | |
| - Try smaller model or batch requests | |
| ## Contributing | |
| See [CONTRIBUTING.md](../CONTRIBUTING.md) for: | |
| - Code style guidelines | |
| - Pull request process | |
| - Issue reporting | |
| - Testing requirements | |
| ## Support | |
| - Issues: GitHub Issues | |
| - Discussions: GitHub Discussions | |
| - Documentation: See `/docs` | |
| ## Resources | |
| - [LangGraph Docs](https://langchain-ai.github.io/langgraph/) | |
| - [Groq API Docs](https://console.groq.com) | |
| - [FAISS Documentation](https://github.com/facebookresearch/faiss/wiki) | |
| - [FastAPI Guide](https://fastapi.tiangolo.com/) | |
| - [Pydantic V2](https://docs.pydantic.dev/latest/) | |