Spaces:
Running
Running
| # MediGuard AI / RagBot - Comprehensive Remediation Plan | |
| > **Generated:** February 24, 2026 | |
| > **Status:** β COMPLETED | |
| > **Last Updated:** Session completion | |
| > **Priority Levels:** P0 (Critical) β P3 (Nice-to-have) | |
| --- | |
| ## Implementation Status | |
| | # | Issue | Status | Notes | | |
| |---|-------|--------|-------| | |
| | 1 | Dual Architecture | β Complete | Consolidated to src/main.py | | |
| | 2 | Fake ML Prediction | β Complete | Renamed to rule-based heuristics | | |
| | 3 | Vector Store Abstraction | β Complete | Created unified retriever interface | | |
| | 4 | Evolution System | β Complete | Archived to archive/evolution/ | | |
| | 5 | Evaluation System | β Complete | Added deterministic mode | | |
| | 6 | HuggingFace Duplication | β Complete | Reduced from 1175β1086 lines | | |
| | 7 | Test Coverage | β Complete | Added tests/test_integration.py | | |
| | 8 | Database Schema | βοΈ Deferred | Not needed for HuggingFace | | |
| | 9 | Documentation | β Complete | README.md updated | | |
| | 10 | Gradio Dependencies | β Complete | Shared utils created | | |
| --- | |
| ## Table of Contents | |
| 1. [Executive Summary](#executive-summary) | |
| 2. [Issue 1: Dual Architecture Confusion](#issue-1-dual-architecture-confusion-p0) | |
| 3. [Issue 2: Fake ML Disease Prediction](#issue-2-fake-ml-disease-prediction-p1) | |
| 4. [Issue 3: Vector Store Abstraction](#issue-3-vector-store-abstraction-p1) | |
| 5. [Issue 4: Orphaned Evolution System](#issue-4-orphaned-evolution-system-p2) | |
| 6. [Issue 5: Unreliable Evaluation System](#issue-5-unreliable-evaluation-system-p2) | |
| 7. [Issue 6: HuggingFace Code Duplication](#issue-6-huggingface-code-duplication-p2) | |
| 8. [Issue 7: Inadequate Test Coverage](#issue-7-inadequate-test-coverage-p1) | |
| 9. [Issue 8: Database Schema Unused](#issue-8-database-schema-unused-p3) | |
| 10. [Issue 9: Documentation Misalignment](#issue-9-documentation-misalignment-p1) | |
| 11. [Issue 10: Gradio App Dependencies](#issue-10-gradio-app-dependencies-p2) | |
| 12. [Implementation Roadmap](#implementation-roadmap) | |
| --- | |
| ## Executive Summary | |
| The RagBot codebase has **10 structural issues** that create confusion, maintenance burden, and misleading claims. The most critical issues are: | |
| | Priority | Issue | Impact | Effort | | |
| |----------|-------|--------|--------| | |
| | P0 | Dual Architecture | High confusion, duplicated code paths | 3-5 days | | |
| | P1 | Fake ML Prediction | Misleading users, false claims | 2-3 days | | |
| | P1 | Vector Store Mess | Production vs local mismatch | 2 days | | |
| | P1 | Missing Tests | Unreliable deployments | 3-4 days | | |
| | P1 | Doc Misalignment | User confusion | 1 day | | |
| | P2 | Orphaned Evolution | Dead code, wasted complexity | 1-2 days | | |
| | P2 | Evaluation System | Unreliable quality metrics | 2 days | | |
| | P2 | HuggingFace Duplication | 1175-line standalone app | 2-3 days | | |
| | P2 | Gradio Dependencies | Can't run standalone | 0.5 days | | |
| | P3 | Unused Database | Alembic setup with no migrations | 1 day | | |
| --- | |
| ## Issue 1: Dual Architecture Confusion (P0) | |
| ### Problem | |
| Two competing LangGraph workflows exist: | |
| | Component | Path | Purpose | | |
| |-----------|------|---------| | |
| | **ClinicalInsightGuild** | `src/workflow.py` | Original 6-agent biomarker analysis | | |
| | **AgenticRAGService** | `src/services/agents/agentic_rag.py` | Newer Q&A RAG pipeline | | |
| The API routes them confusingly: | |
| - `/analyze/*` β ClinicalInsightGuild via `api/app/services/ragbot.py` | |
| - `/ask` β AgenticRAGService via `src/routers/ask.py` | |
| **Evidence:** | |
| - `src/main.py` initializes BOTH services at startup (lines 91-106) | |
| - `api/app/main.py` is a SEPARATE FastAPI app from `src/main.py` | |
| - Users don't know which one is "production" | |
| ### Solution | |
| **Option A: Merge into Single Unified Pipeline (Recommended)** | |
| ``` | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Unified RAG Pipeline β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β Input β Guardrail β Router β β¬β Biomarker Analysis Path β | |
| β β (6 specialist agents) β | |
| β ββ General Q&A Path β | |
| β (retrieve β grade β gen) β | |
| β β Output Synthesizer β Response β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| **Implementation Steps:** | |
| 1. **Create unified graph** in `src/pipelines/unified_rag.py`: | |
| ```python | |
| # Merge both workflows into one StateGraph | |
| # Use routing logic from guardrail_node to dispatch | |
| ``` | |
| 2. **Delete redundant files:** | |
| - Move `api/app/` logic into `src/routers/` | |
| - Delete `api/app/main.py` (use `src/main.py` only) | |
| - Keep `api/app/services/ragbot.py` as legacy adapter | |
| 3. **Single entry point:** | |
| - `src/main.py` becomes THE server | |
| - `uvicorn src.main:app` everywhere | |
| 4. **Update imports:** | |
| ```python | |
| # In src/main.py, replace: | |
| from api.app.services.ragbot import get_ragbot_service | |
| # With: | |
| from src.pipelines.unified_rag import UnifiedRAGService | |
| ``` | |
| **Files to Create:** | |
| - `src/pipelines/__init__.py` | |
| - `src/pipelines/unified_rag.py` | |
| - `src/pipelines/nodes/__init__.py` (merge all nodes) | |
| **Files to Delete/Archive:** | |
| - `api/app/main.py` β Archive to `api/app/main_legacy.py` | |
| - `api/app/routes/` β Merge into `src/routers/` | |
| --- | |
| ## Issue 2: Fake ML Disease Prediction (P1) | |
| ### Problem | |
| The README claims "ML prediction" but `predict_disease_simple()` is pure if/else: | |
| ```python | |
| # scripts/chat.py lines 151-216 | |
| if glucose > 126: | |
| scores["Diabetes"] += 0.4 | |
| if hba1c >= 6.5: | |
| scores["Diabetes"] += 0.5 | |
| ``` | |
| There's also an LLM-based predictor (`predict_disease_llm()`) that just asks an LLM to guess. | |
| ### Solution | |
| **Option A: Be Honest (Quick Fix)** | |
| Update all documentation to say "rule-based heuristics" not "ML prediction": | |
| ```markdown | |
| # In README.md: | |
| - **Disease Prediction** - Rule-based scoring on 5 conditions | |
| (Diabetes, Anemia, Heart Disease, Thrombocytopenia, Thalassemia) | |
| ``` | |
| **Option B: Implement Real ML (Longer)** | |
| 1. **Create a proper classifier:** | |
| ```python | |
| # src/models/disease_classifier.py | |
| from sklearn.ensemble import RandomForestClassifier | |
| import joblib | |
| class DiseaseClassifier: | |
| def __init__(self, model_path: str = "models/disease_rf.joblib"): | |
| self.model = joblib.load(model_path) | |
| self.feature_names = [...] # 24 biomarkers | |
| def predict(self, biomarkers: dict) -> dict: | |
| features = self._to_feature_vector(biomarkers) | |
| proba = self.model.predict_proba([features])[0] | |
| return { | |
| "disease": self.model.classes_[proba.argmax()], | |
| "confidence": float(proba.max()), | |
| "probabilities": dict(zip(self.model.classes_, proba.tolist())) | |
| } | |
| ``` | |
| 2. **Train on synthetic data:** | |
| - Create `scripts/train_disease_model.py` | |
| - Generate synthetic patient data with known conditions | |
| - Train RandomForest/XGBoost classifier | |
| - Save to `models/disease_rf.joblib` | |
| 3. **Replace predictor calls:** | |
| ```python | |
| # Instead of predict_disease_simple(biomarkers) | |
| from src.models.disease_classifier import get_classifier | |
| prediction = get_classifier().predict(biomarkers) | |
| ``` | |
| **Recommendation:** Do Option A immediately, Option B as a follow-up feature. | |
| --- | |
| ## Issue 3: Vector Store Abstraction (P1) | |
| ### Problem | |
| Two different vector stores used inconsistently: | |
| | Context | Store | Configuration | | |
| |---------|-------|---------------| | |
| | Local dev | FAISS | `data/vector_stores/medical_knowledge.faiss` | | |
| | Production | OpenSearch | `OPENSEARCH__HOST` env var | | |
| | HuggingFace | FAISS | Bundled in `huggingface/` | | |
| The code has: | |
| - `src/pdf_processor.py` β FAISS | |
| - `src/services/opensearch/client.py` β OpenSearch | |
| - `src/services/agents/nodes/retrieve_node.py` β OpenSearch only | |
| ### Solution | |
| **Create a unified retriever interface:** | |
| ```python | |
| # src/services/retrieval/interface.py | |
| from abc import ABC, abstractmethod | |
| from typing import List, Dict, Any | |
| class BaseRetriever(ABC): | |
| @abstractmethod | |
| def search(self, query: str, top_k: int = 10) -> List[Dict[str, Any]]: | |
| """Return list of {id, score, text, title, section, metadata}""" | |
| pass | |
| @abstractmethod | |
| def search_hybrid(self, query: str, embedding: List[float], top_k: int = 10) -> List[Dict[str, Any]]: | |
| pass | |
| ``` | |
| ```python | |
| # src/services/retrieval/faiss_retriever.py | |
| class FAISSRetriever(BaseRetriever): | |
| def __init__(self, vector_store_path: str, embedding_model): | |
| self.store = FAISS.load_local(vector_store_path, embedding_model, ...) | |
| def search(self, query: str, top_k: int = 10): | |
| docs = self.store.similarity_search(query, k=top_k) | |
| return [{"id": i, "score": 0, "text": d.page_content, ...} for i, d in enumerate(docs)] | |
| ``` | |
| ```python | |
| # src/services/retrieval/opensearch_retriever.py | |
| class OpenSearchRetriever(BaseRetriever): | |
| def __init__(self, client: OpenSearchClient): | |
| self.client = client | |
| def search(self, query: str, top_k: int = 10): | |
| return self.client.search_bm25(query, top_k=top_k) | |
| ``` | |
| ```python | |
| # src/services/retrieval/__init__.py | |
| def get_retriever() -> BaseRetriever: | |
| """Factory that returns appropriate retriever based on config.""" | |
| settings = get_settings() | |
| if settings.opensearch.host and _opensearch_available(): | |
| return OpenSearchRetriever(make_opensearch_client()) | |
| else: | |
| return FAISSRetriever("data/vector_stores", get_embedding_model()) | |
| ``` | |
| **Update retrieve_node.py:** | |
| ```python | |
| def retrieve_node(state: dict, *, context: Any) -> dict: | |
| retriever = context.retriever # Now uses unified interface | |
| results = retriever.search_hybrid(query, embedding, top_k=10) | |
| ... | |
| ``` | |
| --- | |
| ## Issue 4: Orphaned Evolution System (P2) | |
| ### Problem | |
| `src/evolution/` contains a complete SOP evolution system that: | |
| - Has `SOPGenePool` for versioning | |
| - Has `performance_diagnostician()` for diagnosis | |
| - Has `sop_architect()` for mutations | |
| - Has an Airflow DAG (`airflow/dags/sop_evolution.py`) | |
| **But:** | |
| - No Airflow deployment exists | |
| - `run_evolution_cycle()` requires manual invocation | |
| - No UI to trigger evolution | |
| - No tracking of which SOP version is in use | |
| ### Solution | |
| **Option A: Remove It (Quick)** | |
| Delete or archive the unused code: | |
| ``` | |
| mkdir -p archive/evolution | |
| mv src/evolution/* archive/evolution/ | |
| mv airflow/dags/sop_evolution.py archive/ | |
| ``` | |
| Update imports to remove references. | |
| **Option B: Wire It Up (If Actually Wanted)** | |
| 1. **Add CLI command:** | |
| ```python | |
| # scripts/evolve_sop.py | |
| from src.evolution.director import run_evolution_cycle | |
| from src.workflow import create_guild | |
| if __name__ == "__main__": | |
| gene_pool = SOPGenePool() | |
| # Load baseline, run evolution, save results | |
| ``` | |
| 2. **Add API endpoint:** | |
| ```python | |
| # src/routers/admin.py | |
| @router.post("/admin/evolve") | |
| async def trigger_evolution(request: Request): | |
| # Requires admin auth | |
| result = run_evolution_cycle(...) | |
| return {"new_versions": len(result)} | |
| ``` | |
| 3. **Persist to database:** | |
| - Use Alembic migrations to create `sop_versions` table | |
| - Store evolved SOPs with evaluation scores | |
| --- | |
| ## Issue 5: Unreliable Evaluation System (P2) | |
| ### Problem | |
| `src/evaluation/evaluators.py` uses LLM-as-judge for: | |
| - `evaluate_clinical_accuracy()` - LLM grades medical correctness | |
| - `evaluate_actionability()` - LLM grades recommendations | |
| **Problems:** | |
| 1. LLMs are unreliable judges of medical accuracy | |
| 2. No ground truth comparison | |
| 3. Scores can fluctuate between runs | |
| 4. Falls back to 0.5 on JSON parse errors (line 91) | |
| ### Solution | |
| **Replace with deterministic metrics where possible:** | |
| ```python | |
| # For clinical_accuracy: Use BiomarkerValidator as ground truth | |
| def evaluate_clinical_accuracy_v2(response: Dict, biomarkers: Dict) -> GradedScore: | |
| validator = BiomarkerValidator() | |
| # Check if flagged biomarkers match validator | |
| expected_flags = validator.validate_all(biomarkers)[0] | |
| actual_flags = response.get("biomarker_flags", []) | |
| expected_abnormal = {f.name for f in expected_flags if f.status != "NORMAL"} | |
| actual_abnormal = {f["name"] for f in actual_flags if f["status"] != "NORMAL"} | |
| precision = len(expected_abnormal & actual_abnormal) / max(len(actual_abnormal), 1) | |
| recall = len(expected_abnormal & actual_abnormal) / max(len(expected_abnormal), 1) | |
| f1 = 2 * precision * recall / max(precision + recall, 0.001) | |
| return GradedScore( | |
| score=f1, | |
| reasoning=f"Precision: {precision:.2f}, Recall: {recall:.2f}" | |
| ) | |
| ``` | |
| **Keep LLM-as-judge only for subjective metrics:** | |
| - Clarity (readability) - already programmatic β | |
| - Helpfulness of recommendations - needs human judgment | |
| **Add human-in-the-loop:** | |
| ```python | |
| # src/evaluation/human_eval.py | |
| def collect_human_rating(response_id: str) -> Optional[float]: | |
| """Store human ratings for later analysis.""" | |
| # Integrate with Langfuse or custom feedback endpoint | |
| ``` | |
| --- | |
| ## Issue 6: HuggingFace Code Duplication (P2) | |
| ### Problem | |
| `huggingface/app.py` is **1175 lines** that reimplements: | |
| - Biomarker parsing (duplicated from chat.py) | |
| - Disease prediction (duplicated) | |
| - Guild initialization (duplicated) | |
| - Gradio UI (different from src/gradio_app.py) | |
| - Environment handling (custom) | |
| ### Solution | |
| **Refactor to import from main package:** | |
| ```python | |
| # huggingface/app.py (simplified to ~200 lines) | |
| import sys | |
| sys.path.insert(0, "..") | |
| from src.workflow import create_guild | |
| from src.state import PatientInput | |
| from scripts.chat import extract_biomarkers, predict_disease_simple | |
| # Only Gradio-specific code here | |
| def analyze_biomarkers(input_text: str): | |
| biomarkers, context = extract_biomarkers(input_text) | |
| prediction = predict_disease_simple(biomarkers) | |
| patient_input = PatientInput( | |
| biomarkers=biomarkers, | |
| model_prediction=prediction, | |
| patient_context=context | |
| ) | |
| guild = get_guild() | |
| result = guild.run(patient_input) | |
| return format_result(result) | |
| # Gradio interface... | |
| ``` | |
| **Create shared utilities module:** | |
| ```python | |
| # src/utils/biomarker_extraction.py | |
| # Move extract_biomarkers() from chat.py here | |
| # src/utils/disease_scoring.py | |
| # Move predict_disease_simple() here | |
| ``` | |
| --- | |
| ## Issue 7: Inadequate Test Coverage (P1) | |
| ### Problem | |
| Current tests are mostly: | |
| - Import validation (`test_basic.py`) | |
| - Unit tests with mocks (`test_agentic_rag.py`) | |
| - Schema validation (`test_schemas.py`) | |
| **Missing:** | |
| - End-to-end workflow tests | |
| - API integration tests | |
| - Regression tests for medical accuracy | |
| ### Solution | |
| **Add integration tests:** | |
| ```python | |
| # tests/integration/test_full_workflow.py | |
| import pytest | |
| from src.workflow import create_guild | |
| from src.state import PatientInput | |
| @pytest.fixture(scope="module") | |
| def guild(): | |
| return create_guild() | |
| def test_diabetes_patient_analysis(guild): | |
| patient = PatientInput( | |
| biomarkers={"Glucose": 185, "HbA1c": 8.2}, | |
| model_prediction={"disease": "Diabetes", "confidence": 0.87, "probabilities": {}}, | |
| patient_context={"age": 52, "gender": "male"} | |
| ) | |
| result = guild.run(patient) | |
| # Assertions | |
| assert result.get("final_response") is not None | |
| assert len(result.get("biomarker_flags", [])) >= 2 | |
| assert any(f["name"] == "Glucose" for f in result["biomarker_flags"]) | |
| assert "Diabetes" in result["final_response"]["prediction_explanation"]["primary_disease"] | |
| def test_anemia_patient_analysis(guild): | |
| patient = PatientInput( | |
| biomarkers={"Hemoglobin": 9.5, "MCV": 75}, | |
| model_prediction={"disease": "Anemia", "confidence": 0.75, "probabilities": {}}, | |
| patient_context={} | |
| ) | |
| result = guild.run(patient) | |
| assert result.get("final_response") is not None | |
| ``` | |
| **Add API tests:** | |
| ```python | |
| # tests/integration/test_api_endpoints.py | |
| import pytest | |
| from fastapi.testclient import TestClient | |
| from src.main import app | |
| @pytest.fixture | |
| def client(): | |
| return TestClient(app) | |
| def test_health_endpoint(client): | |
| response = client.get("/health") | |
| assert response.status_code == 200 | |
| assert response.json()["status"] == "healthy" | |
| def test_analyze_structured(client): | |
| response = client.post("/analyze/structured", json={ | |
| "biomarkers": {"Glucose": 140, "HbA1c": 7.0} | |
| }) | |
| assert response.status_code == 200 | |
| assert "prediction" in response.json() | |
| ``` | |
| **Add to CI:** | |
| ```yaml | |
| # .github/workflows/test.yml | |
| - name: Run integration tests | |
| run: pytest tests/integration/ -v | |
| env: | |
| GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }} | |
| ``` | |
| --- | |
| ## Issue 8: Database Schema Unused (P3) | |
| ### Problem | |
| - `alembic/` is configured but `alembic/versions/` is empty | |
| - `src/database.py` exists but is barely used | |
| - `src/db/models.py` defines tables that aren't created | |
| ### Solution | |
| **If database features are wanted:** | |
| 1. Create initial migration: | |
| ```bash | |
| cd src | |
| alembic revision --autogenerate -m "Initial schema" | |
| alembic upgrade head | |
| ``` | |
| 2. Use models for: | |
| - Storing analysis history | |
| - Persisting evolved SOPs | |
| - User feedback collection | |
| **If not needed:** | |
| - Remove `alembic/` directory | |
| - Remove `src/database.py` | |
| - Remove `src/db/` if empty | |
| - Remove `postgres` from `docker-compose.yml` | |
| --- | |
| ## Issue 9: Documentation Misalignment (P1) | |
| ### Problem | |
| README.md claims: | |
| - "ML prediction" β It's rule-based | |
| - "6 Specialist Agents" β Also has agentic RAG (7+ nodes) | |
| - "Production-ready" β Two competing entry points | |
| ### Solution | |
| **Update README.md:** | |
| ```markdown | |
| ## How It Works | |
| ### Analysis Pipeline | |
| RagBot uses a **multi-agent LangGraph workflow** to analyze biomarkers: | |
| 1. **Input Routing** - Validates query is medical, routes to analysis or Q&A | |
| 2. **Biomarker Analyzer** - Validates values against clinical reference ranges | |
| 3. **Disease Scorer** - Rule-based heuristics predict most likely condition | |
| 4. **Disease Explainer** - RAG retrieval for pathophysiology from medical PDFs | |
| 5. **Guidelines Agent** - RAG retrieval for treatment recommendations | |
| 6. **Response Synthesizer** - Compiles findings into patient-friendly summary | |
| ### Supported Conditions | |
| - Diabetes (via Glucose, HbA1c) | |
| - Anemia (via Hemoglobin, MCV) | |
| - Heart Disease (via Cholesterol, Troponin, LDL) | |
| - Thrombocytopenia (via Platelets) | |
| - Thalassemia (via MCV + Hemoglobin pattern) | |
| > **Note:** Disease prediction uses rule-based scoring, not ML models. | |
| > Future versions may include trained classifiers. | |
| ``` | |
| --- | |
| ## Issue 10: Gradio App Dependencies (P2) | |
| ### Problem | |
| `src/gradio_app.py` is just an HTTP client: | |
| ```python | |
| def _call_ask(question: str) -> str: | |
| resp = client.post(f"{API_BASE}/ask", json={"question": question}) | |
| ``` | |
| It requires the FastAPI server running at `http://localhost:8000`. | |
| ### Solution | |
| **Option A: Document the dependency clearly:** | |
| Add startup instructions: | |
| ```markdown | |
| ## Running the Gradio UI | |
| 1. Start the API server: | |
| ```bash | |
| uvicorn src.main:app --reload | |
| ``` | |
| 2. In another terminal, start Gradio: | |
| ```bash | |
| python -m src.gradio_app | |
| ``` | |
| 3. Open http://localhost:7860 | |
| ``` | |
| **Option B: Add embedded mode:** | |
| ```python | |
| # src/gradio_app.py | |
| def _call_ask_embedded(question: str) -> str: | |
| """Direct workflow invocation without HTTP.""" | |
| from src.services.agents.agentic_rag import AgenticRAGService | |
| service = get_rag_service() | |
| result = service.ask(query=question) | |
| return result.get("final_answer", "No answer.") | |
| def launch_gradio(embedded: bool = False, share: bool = False): | |
| ask_fn = _call_ask_embedded if embedded else _call_ask | |
| # ... rest of UI | |
| ``` | |
| --- | |
| ## Implementation Roadmap | |
| ### Phase 1: Critical Fixes (Week 1) | |
| | Day | Task | Owner | | |
| |-----|------|-------| | |
| | 1 | Fix documentation claims (README.md) | - | | |
| | 1-2 | Consolidate entry points (delete api/app/main.py) | - | | |
| | 2-3 | Create unified retriever interface | - | | |
| | 3-4 | Add integration tests for workflow | - | | |
| | 5 | Update Gradio startup docs | - | | |
| ### Phase 2: Architecture Cleanup (Week 2) | |
| | Day | Task | Owner | | |
| |-----|------|-------| | |
| | 1-2 | Merge AgenticRAG + ClinicalInsightGuild | - | | |
| | 3 | Refactor HuggingFace app to use shared code | - | | |
| | 4 | Wire up or remove evolution system | - | | |
| | 5 | Review and deploy | - | | |
| ### Phase 3: Quality Improvements (Week 3) | |
| | Day | Task | Owner | | |
| |-----|------|-------| | |
| | 1 | Replace LLM-as-judge with deterministic metrics | - | | |
| | 2 | Add proper disease classifier (optional) | - | | |
| | 3-4 | Expand test coverage to 80%+ | - | | |
| | 5 | Final documentation pass | - | | |
| --- | |
| ## Quick Wins (Do Today) | |
| 1. **Rename `predict_disease_simple`** to `score_disease_heuristic` to be honest | |
| 2. **Add `## Architecture` section** to README explaining the two workflows | |
| 3. **Create `scripts/start_full.ps1`** that starts both API and Gradio | |
| 4. **Delete empty `alembic/versions/`** and document "DB not implemented" | |
| 5. **Add type hints** to top 5 most-used functions | |
| --- | |
| ## Checklist | |
| - [ ] P0: Single FastAPI entry point (`src/main.py` only) | |
| - [ ] P1: Documentation accurately describes capabilities | |
| - [ ] P1: Unified retriever interface (FAISS + OpenSearch) | |
| - [ ] P1: Integration tests exist and pass | |
| - [ ] P2: Evolution system removed or functional | |
| - [ ] P2: HuggingFace app imports from main package | |
| - [ ] P2: Evaluation metrics are deterministic | |
| - [ ] P3: Database either used or removed | |