Spaces:
Running
RagBot Deep Review
Last updated: February 2026
Items marked [RESOLVED] have been fixed. Items marked [OPEN] remain as future work.
Scope
This review covers the end-to-end workflow and supporting services for RagBot, focusing on design correctness, reliability, safety guardrails, and maintainability. The review is based on a close reading of the workflow orchestration, agent implementations, API wiring, extraction and prediction logic, and the knowledge base pipeline.
Primary files reviewed:
src/workflow.pysrc/state.pysrc/config.pysrc/agents/*src/biomarker_validator.pysrc/pdf_processor.pyapi/app/main.pyapi/app/routes/analyze.pyapi/app/services/extraction.pyapi/app/services/ragbot.pyscripts/chat.py
Architectural Understanding (Condensed)
End-to-End Flow
- Input arrives via CLI (
scripts/chat.py) or REST API (api/app/routes/analyze.py). - Natural language inputs are parsed by the extraction service (
api/app/services/extraction.py) to produce normalized biomarkers and patient context. - A rule-based prediction (
predict_disease_simple) produces a disease hypothesis and probabilities. - The LangGraph workflow (
src/workflow.py) orchestrates six agents: Biomarker Analyzer, Disease Explainer, Biomarker Linker, Clinical Guidelines, Confidence Assessor, Response Synthesizer. - The synthesized output is formatted into API schemas (
api/app/services/ragbot.py) or into CLI-friendly responses (scripts/chat.py).
Key Data Structures
GuildStateinsrc/state.pyis the shared workflow state; it depends on additive accumulation for parallel outputs.PatientInputholds structured biomarkers, prediction data, and patient context.- The response format is built in
ResponseSynthesizerAgentand then translated into API schemas inRagBotService.
Knowledge Base
- PDFs are chunked and embedded into FAISS (
src/pdf_processor.py). - Three retrievers (disease explainer, biomarker linker, clinical guidelines) share the same FAISS index with varying
kvalues.
Deep Review Findings
Critical Issues
[OPEN] State propagation is incomplete across the workflow.
src/agents/biomarker_analyzer.pyreturns onlyagent_outputsand not the computedbiomarker_flagsorsafety_alertsinto the top-levelGuildStatekeys that the workflow expects to accumulate.src/workflow.pyinitializesbiomarker_flagsandsafety_alertsin the state, but none of the agents return updates to those keys. As a result,workflow_result.get("biomarker_flags")andworkflow_result.get("safety_alerts")are likely empty when the API response is formatted inapi/app/services/ragbot.py.- Effect: API output will frequently miss biomarkers and alerts, and downstream consumers will incorrectly assume a clean result set.
- Recommendation: return
biomarker_flagsandsafety_alertsfrom the Biomarker Analyzer agent so they accumulate in the state. Ensure the response synth uses those same keys.
[OPEN] LangGraph merge behavior is unsafe for parallel outputs.
GuildStateusesAnnotated[List[AgentOutput], operator.add]for additive merging, but the nodes return only{ 'agent_outputs': [output] }and nothing else. This is okay foragent_outputs, but parallel agents also read from the fullagent_outputslist inside the state to infer prior results.- In parallel branches, a given agent might read a partial
agent_outputslist depending on execution order. This is visible in theBiomarkerDiseaseLinkerAgentandClinicalGuidelinesAgentwhich read the prior Biomarker Analyzer output by searchingagent_outputs. - Effect: nondeterministic behavior if LangGraph schedules a branch before the Biomarker Analyzer output is merged, or if merges occur after the branch starts. This can degrade evidence selection and recommendations.
- Recommendation: explicitly pass relevant artifacts as dedicated state fields updated by the Biomarker Analyzer, and read those fields directly instead of scanning
agent_outputs.
[RESOLVED] Schema mismatch between workflow output and API formatter.
ResponseSynthesizerAgentreturns a structured response with keys likepatient_summary,prediction_explanation,clinical_recommendations,confidence_assessment, andsafety_alerts.RagBotService._format_response()now correctly reads fromfinal_responseand handles both Pydantic objects and dicts.- The CLI (
scripts/chat.py) uses_coerce_to_dict()andformat_conversational()to safely handle all output types. - Fix applied:
_format_response()updated +_coerce_to_dict()helper added.
High Priority Issues
[OPEN] Prediction confidence is forced to 0.5 and default disease is always Diabetes.
- Both the API and CLI
predict_disease_simplefunctions enforce a minimum confidence of 0.5 and default to Diabetes when confidence is low. - Effect: leads to biased predictions and false confidence. This is risky in a medical domain and undermines reliability assessments.
- Recommendation: return a low-confidence prediction explicitly and mark reliability as low; avoid forcing a disease when evidence is insufficient.
- Both the API and CLI
[RESOLVED] Different biomarker naming schemes across extraction modules.
- Both CLI and API now use the shared
src/biomarker_normalization.pymodule with 80+ aliases mapped to 24 canonical names. - Fix applied: unified normalization in both
scripts/chat.pyandapi/app/services/extraction.py.
- Both CLI and API now use the shared
[RESOLVED] Use of console glyphs and non-ASCII prefixes in logs and output.
- Debug prints removed from CLI. Logging suppressed for noisy HuggingFace/transformers output.
- API responses use clean JSON only; CLI uses UTF-8 emojis only in user-facing output.
- Fix applied:
[DEBUG]prints removed,BertModel LOAD REPORTsuppressed, HuggingFace deprecation warnings filtered.
Medium Priority Issues
[RESOLVED] Inconsistent model selection between agents.
- All agents now use
llm_configcentralized configuration (planner, analyzer, explainer, synthesizer properties). - Fix applied:
src/llm_config.pyprovidesLLMConfigsingleton with per-role properties.
- All agents now use
[RESOLVED] Potential JSON parsing fragility in extraction.
_parse_llm_json()now handles markdown fences, trailing text, and partial JSON recovery.- Fix applied: robust JSON parser in
api/app/services/extraction.pywith test coverage (test_json_parsing.py).
[RESOLVED] Knowledge base retrieval does not enforce citations.
- Disease Explainer agent now checks
sop.require_pdf_citationsand returns "insufficient evidence" when no documents are retrieved. - Fix applied: citation guardrail in
src/agents/disease_explainer.pywith test (test_citation_guardrails.py).
- Disease Explainer agent now checks
Low Priority Issues
[OPEN] Error handling does not preserve original exceptions cleanly in API layer.
- Exceptions are wrapped in
RuntimeErrorwithout detail separation;RagBotService.analyze()does not attach contextual hints (e.g., which agent failed). - Recommendation: wrap exceptions with agent name and error classification to improve observability.
- Exceptions are wrapped in
[RESOLVED] Hard-coded expected biomarker count (24) in Confidence Assessor.
- Now uses
BiomarkerValidator().expected_biomarker_count()which reads fromconfig/biomarker_references.json. - Test:
test_validator_count.pyverifies count matches reference config.
- Now uses
Suggested Improvements (Summary)
Align workflow output and API schema.[RESOLVED]- Promote biomarker flags and safety alerts to first-class state fields in the workflow. [OPEN]
Use a shared normalization utility.[RESOLVED]- Remove forced minimum confidence and default disease; permit "low confidence" results. [OPEN]
Introduce citation enforcement as a guardrail for RAG outputs.[RESOLVED]Centralize model selection and logging format.[RESOLVED]
Verification Gaps
The following should be tested once fixes are made:
- Natural language extraction with partial and noisy inputs.
- Workflow run where no abnormal biomarkers are detected.
- API response schema validation for both natural and structured routes.
- Parallel agent execution determinism (state access to biomarker analysis).
- CLI behavior for biomarker names that differ from API normalization.