# RagBot Deep Review > **Last updated**: February 2026 > Items marked **[RESOLVED]** have been fixed. Items marked **[OPEN]** remain as future work. ## Scope This review covers the end-to-end workflow and supporting services for RagBot, focusing on design correctness, reliability, safety guardrails, and maintainability. The review is based on a close reading of the workflow orchestration, agent implementations, API wiring, extraction and prediction logic, and the knowledge base pipeline. Primary files reviewed: - `src/workflow.py` - `src/state.py` - `src/config.py` - `src/agents/*` - `src/biomarker_validator.py` - `src/pdf_processor.py` - `api/app/main.py` - `api/app/routes/analyze.py` - `api/app/services/extraction.py` - `api/app/services/ragbot.py` - `scripts/chat.py` ## Architectural Understanding (Condensed) ### End-to-End Flow 1. Input arrives via CLI (`scripts/chat.py`) or REST API (`api/app/routes/analyze.py`). 2. Natural language inputs are parsed by the extraction service (`api/app/services/extraction.py`) to produce normalized biomarkers and patient context. 3. A rule-based prediction (`predict_disease_simple`) produces a disease hypothesis and probabilities. 4. The LangGraph workflow (`src/workflow.py`) orchestrates six agents: Biomarker Analyzer, Disease Explainer, Biomarker Linker, Clinical Guidelines, Confidence Assessor, Response Synthesizer. 5. The synthesized output is formatted into API schemas (`api/app/services/ragbot.py`) or into CLI-friendly responses (`scripts/chat.py`). ### Key Data Structures - `GuildState` in `src/state.py` is the shared workflow state; it depends on additive accumulation for parallel outputs. - `PatientInput` holds structured biomarkers, prediction data, and patient context. - The response format is built in `ResponseSynthesizerAgent` and then translated into API schemas in `RagBotService`. ### Knowledge Base - PDFs are chunked and embedded into FAISS (`src/pdf_processor.py`). - Three retrievers (disease explainer, biomarker linker, clinical guidelines) share the same FAISS index with varying `k` values. ## Deep Review Findings ### Critical Issues 1. **[OPEN] State propagation is incomplete across the workflow.** - `src/agents/biomarker_analyzer.py` returns only `agent_outputs` and not the computed `biomarker_flags` or `safety_alerts` into the top-level `GuildState` keys that the workflow expects to accumulate. - `src/workflow.py` initializes `biomarker_flags` and `safety_alerts` in the state, but none of the agents return updates to those keys. As a result, `workflow_result.get("biomarker_flags")` and `workflow_result.get("safety_alerts")` are likely empty when the API response is formatted in `api/app/services/ragbot.py`. - Effect: API output will frequently miss biomarkers and alerts, and downstream consumers will incorrectly assume a clean result set. - Recommendation: return `biomarker_flags` and `safety_alerts` from the Biomarker Analyzer agent so they accumulate in the state. Ensure the response synth uses those same keys. 2. **[OPEN] LangGraph merge behavior is unsafe for parallel outputs.** - `GuildState` uses `Annotated[List[AgentOutput], operator.add]` for additive merging, but the nodes return only `{ 'agent_outputs': [output] }` and nothing else. This is okay for `agent_outputs`, but parallel agents also read from the full `agent_outputs` list inside the state to infer prior results. - In parallel branches, a given agent might read a partial `agent_outputs` list depending on execution order. This is visible in the `BiomarkerDiseaseLinkerAgent` and `ClinicalGuidelinesAgent` which read the prior Biomarker Analyzer output by searching `agent_outputs`. - Effect: nondeterministic behavior if LangGraph schedules a branch before the Biomarker Analyzer output is merged, or if merges occur after the branch starts. This can degrade evidence selection and recommendations. - Recommendation: explicitly pass relevant artifacts as dedicated state fields updated by the Biomarker Analyzer, and read those fields directly instead of scanning `agent_outputs`. 3. **[RESOLVED] Schema mismatch between workflow output and API formatter.** - `ResponseSynthesizerAgent` returns a structured response with keys like `patient_summary`, `prediction_explanation`, `clinical_recommendations`, `confidence_assessment`, and `safety_alerts`. - `RagBotService._format_response()` now correctly reads from `final_response` and handles both Pydantic objects and dicts. - The CLI (`scripts/chat.py`) uses `_coerce_to_dict()` and `format_conversational()` to safely handle all output types. - **Fix applied**: `_format_response()` updated + `_coerce_to_dict()` helper added. ### High Priority Issues 1. **[OPEN] Prediction confidence is forced to 0.5 and default disease is always Diabetes.** - Both the API and CLI `predict_disease_simple` functions enforce a minimum confidence of 0.5 and default to Diabetes when confidence is low. - Effect: leads to biased predictions and false confidence. This is risky in a medical domain and undermines reliability assessments. - Recommendation: return a low-confidence prediction explicitly and mark reliability as low; avoid forcing a disease when evidence is insufficient. 2. **[RESOLVED] Different biomarker naming schemes across extraction modules.** - Both CLI and API now use the shared `src/biomarker_normalization.py` module with 80+ aliases mapped to 24 canonical names. - **Fix applied**: unified normalization in both `scripts/chat.py` and `api/app/services/extraction.py`. 3. **[RESOLVED] Use of console glyphs and non-ASCII prefixes in logs and output.** - Debug prints removed from CLI. Logging suppressed for noisy HuggingFace/transformers output. - API responses use clean JSON only; CLI uses UTF-8 emojis only in user-facing output. - **Fix applied**: `[DEBUG]` prints removed, `BertModel LOAD REPORT` suppressed, HuggingFace deprecation warnings filtered. ### Medium Priority Issues 1. **[RESOLVED] Inconsistent model selection between agents.** - All agents now use `llm_config` centralized configuration (planner, analyzer, explainer, synthesizer properties). - **Fix applied**: `src/llm_config.py` provides `LLMConfig` singleton with per-role properties. 2. **[RESOLVED] Potential JSON parsing fragility in extraction.** - `_parse_llm_json()` now handles markdown fences, trailing text, and partial JSON recovery. - **Fix applied**: robust JSON parser in `api/app/services/extraction.py` with test coverage (`test_json_parsing.py`). 3. **[RESOLVED] Knowledge base retrieval does not enforce citations.** - Disease Explainer agent now checks `sop.require_pdf_citations` and returns "insufficient evidence" when no documents are retrieved. - **Fix applied**: citation guardrail in `src/agents/disease_explainer.py` with test (`test_citation_guardrails.py`). ### Low Priority Issues 1. **[OPEN] Error handling does not preserve original exceptions cleanly in API layer.** - Exceptions are wrapped in `RuntimeError` without detail separation; `RagBotService.analyze()` does not attach contextual hints (e.g., which agent failed). - Recommendation: wrap exceptions with agent name and error classification to improve observability. 2. **[RESOLVED] Hard-coded expected biomarker count (24) in Confidence Assessor.** - Now uses `BiomarkerValidator().expected_biomarker_count()` which reads from `config/biomarker_references.json`. - Test: `test_validator_count.py` verifies count matches reference config. ## Suggested Improvements (Summary) 1. ~~Align workflow output and API schema.~~ **[RESOLVED]** 2. Promote biomarker flags and safety alerts to first-class state fields in the workflow. **[OPEN]** 3. ~~Use a shared normalization utility.~~ **[RESOLVED]** 4. Remove forced minimum confidence and default disease; permit "low confidence" results. **[OPEN]** 5. ~~Introduce citation enforcement as a guardrail for RAG outputs.~~ **[RESOLVED]** 6. ~~Centralize model selection and logging format.~~ **[RESOLVED]** ## Verification Gaps The following should be tested once fixes are made: - Natural language extraction with partial and noisy inputs. - Workflow run where no abnormal biomarkers are detected. - API response schema validation for both natural and structured routes. - Parallel agent execution determinism (state access to biomarker analysis). - CLI behavior for biomarker names that differ from API normalization.