Spaces:

yipengsun
/

diagnostic-devils-advocate

Running on Zero

yipengsun Claude Opus 4.5 commited on Jan 28

Commit

c0fff99

0 Parent(s):

Initial commit: Diagnostic Devil's Advocate project

Multi-agent medical diagnosis system using MedGemma, MedSigLIP, and MedASR
with LangGraph orchestration and Gradio UI.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (31) hide show

.gitignore +42 -0
README.md +268 -0
agents/__init__.py +0 -0
agents/bias_detector.py +135 -0
agents/consultant.py +94 -0
agents/devil_advocate.py +313 -0
agents/diagnostician.py +91 -0
agents/graph.py +145 -0
agents/output_parser.py +124 -0
agents/prompts.py +142 -0
agents/state.py +80 -0
app.py +41 -0
config.py +73 -0
data/demo_cases/SOURCES.md +105 -0
data/demo_cases/case1_pneumothorax.png +0 -0
data/demo_cases/case2_aortic_dissection.png +0 -0
data/demo_cases/case3_pulmonary_embolism.png +0 -0
models/__init__.py +0 -0
models/medasr_client.py +111 -0
models/medgemma_client.py +222 -0
models/medsiglip_client.py +164 -0
models/utils.py +49 -0
requirements.txt +10 -0
tests/__init__.py +0 -0
tests/test_output_parser.py +24 -0
tests/test_pipeline_mock.py +124 -0
tests/test_smoke.py +35 -0
ui/__init__.py +0 -0
ui/callbacks.py +459 -0
ui/components.py +317 -0
ui/css.py +726 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,42 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+*.egg-info/
+dist/
+build/
+*.egg
+# Virtual environments
+.venv/
+venv/
+env/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+# Jupyter
+.ipynb_checkpoints/
+# Environment variables
+.env
+.env.*
+# OS
+.DS_Store
+Thumbs.db
+# Model weights / large files
+*.bin
+*.pt
+*.pth
+*.onnx
+*.safetensors
+# Logs
+*.log
+logs/

README.md ADDED Viewed

	@@ -0,0 +1,268 @@

+---
+title: Diagnostic Devil's Advocate
+emoji: "\U0001FA7A"
+colorFrom: red
+colorTo: blue
+sdk: gradio
+sdk_version: "5.12.0"
+app_file: app.py
+pinned: false
+license: apache-2.0
+tags:
+  - medgemma
+  - medical-imaging
+  - multi-agent
+  - cognitive-bias
+  - radiology
+---
+<div align="center">
+# 🩺 Diagnostic Devil's Advocate
+### AI-Powered Cognitive Debiasing for Clinical Diagnosis
+**A multi-agent system that challenges medical diagnoses to catch what doctors might miss.**
+[![MedGemma](https://img.shields.io/badge/MedGemma-4B%20%7C%2027B-4285F4?style=for-the-badge&logo=google&logoColor=white)](https://huggingface.co/google/medgemma-1.5-4b-it)
+[![MedSigLIP](https://img.shields.io/badge/MedSigLIP-448-34A853?style=for-the-badge&logo=google&logoColor=white)](https://huggingface.co/google/medsiglip-448)
+[![LangGraph](https://img.shields.io/badge/LangGraph-Agent%20Pipeline-1C3C3C?style=for-the-badge&logo=langchain&logoColor=white)](https://langchain-ai.github.io/langgraph/)
+[![Gradio](https://img.shields.io/badge/Gradio-UI-F97316?style=for-the-badge&logo=gradio&logoColor=white)](https://gradio.app)
+[Live Demo](#getting-started) &bull; [Architecture](#architecture) &bull; [Demo Cases](#demo-cases) &bull; [Technical Details](#technical-details)
+---
+</div>
+## The Problem
+> *Diagnostic errors affect an estimated **12 million** adults annually in the U.S. alone, with cognitive biases — [anchoring](https://en.wikipedia.org/wiki/Anchoring_(cognitive_bias)), [premature closure](https://en.wikipedia.org/wiki/Premature_closure), [confirmation bias](https://en.wikipedia.org/wiki/Confirmation_bias) — implicated in up to **74%** of cases.* ([Singh et al., BMJ Quality & Safety, 2014](https://qualitysafety.bmj.com/content/23/9/727))
+Doctors are not wrong because they lack knowledge. They are wrong because the human brain takes shortcuts — and in medicine, shortcuts kill. A physician who sees "young patient + chest pain after trauma" anchors on **rib contusion** and stops looking. The pneumothorax on the X-ray goes unseen. The patient deteriorates.
+**Diagnostic Devil's Advocate** is a system that acts as an adversarial second opinion. It does not replace the physician — it challenges them. It asks: *"Have you considered what happens if you're wrong?"*
+## How It Works
+The system runs a **4-agent pipeline** orchestrated by [LangGraph](https://langchain-ai.github.io/langgraph/) where each agent has a distinct adversarial role. Every agent analyzes **both the medical image and the full clinical context** (history, vitals, labs, exam findings) — because some dangerous conditions (aortic dissection, pulmonary embolism) may show subtle or no imaging signs but have obvious clinical red flags. Critically, the first agent does this **without seeing the doctor's diagnosis**, preventing the AI itself from being [anchored](https://en.wikipedia.org/wiki/Anchoring_(cognitive_bias)).
+### The Four Agents
+| Agent | Role | Model | Key Design Choice |
+|:------|:-----|:------|:------------------|
+| **Diagnostician** | Independent image + clinical analysis | [MedGemma 4B-IT](https://huggingface.co/google/medgemma-1.5-4b-it) (multimodal) | **Blinded** — never sees the doctor's diagnosis. Tags each finding as `imaging`, `clinical`, or `both` to distinguish evidence sources. |
+| **Bias Detector** | Compare doctor vs. AI findings | [MedGemma](https://huggingface.co/google/medgemma-1.5-4b-it) 4B/27B + [MedSigLIP](https://huggingface.co/google/medsiglip-448) | Uses **zero-shot image classification** to verify radiological signs. Flags clinical red flags ignored by either assessment. |
+| **Devil's Advocate** | Adversarial challenge | [MedGemma](https://huggingface.co/google/medgemma-27b-text-it) 4B/27B | Deliberately contrarian — uses both imaging and clinical evidence to argue for **[must-not-miss diagnoses](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6775443/)** |
+| **Consultant** | Synthesize final report | [MedGemma](https://huggingface.co/google/medgemma-27b-text-it) 4B/27B | Writes as a **collegial consultant**: *"Have you considered..."* not *"You are wrong."* |
+## Architecture
+The pipeline is orchestrated by [LangGraph](https://langchain-ai.github.io/langgraph/) as a linear `StateGraph`:
+**Gradio UI** (image upload, diagnosis input, clinical context, [MedASR](https://huggingface.co/google/medasr) voice input)
+→ **Diagnostician** — receives image + clinical context but **NOT** the doctor's diagnosis; tags findings by source (`imaging` / `clinical` / `both`)
+→ **Bias Detector** — now receives the doctor's diagnosis, compares it against independent findings using image, clinical data, and [MedSigLIP](https://huggingface.co/google/medsiglip-448) sign verification
+→ **Devil's Advocate** — challenges the working diagnosis using both imaging and clinical evidence for must-not-miss alternatives
+→ **Consultant** — synthesizes a collegial consultation note
+→ **Output** (consultation report, alternative diagnoses, recommended workup)
+### MedSigLIP Sign Verification
+The Bias Detector doesn't just rely on text reasoning — it uses [**MedSigLIP-448**](https://huggingface.co/google/medsiglip-448) for objective visual verification. For each radiological sign mentioned by the Diagnostician (e.g., "pleural effusion", "cardiomegaly", "pneumothorax"), MedSigLIP performs [zero-shot binary classification](https://huggingface.co/tasks/zero-shot-image-classification): it compares the logits of `"chest radiograph showing [sign]"` vs `"normal chest radiograph with no [sign]"`. A logit difference > 2 is classified as "likely present", grounding the bias analysis in **visual evidence** rather than pure language reasoning.
+## Demo Cases
+Three composite clinical scenarios covering the most dangerous diagnostic error patterns:
+<table>
+<tr>
+<td width="33%" valign="top">
+### Case 1: Missed Pneumothorax
+**🏷️ TRAUMA**
+32M, motorcycle collision. Doctor diagnoses **rib contusion**, discharges patient. Supine CXR actually shows a **left pneumothorax** with rib fractures.
+**Bias**: [Satisfaction of search](https://radiopaedia.org/articles/satisfaction-of-search) — found the rib fractures, stopped looking.
+</td>
+<td width="33%" valign="top">
+### Case 2: Aortic Dissection → "GERD"
+**🏷️ VASCULAR**
+58M, hypertensive, tearing chest pain. Doctor diagnoses **acid reflux**, prescribes antacids. Blood pressure asymmetry (178/102 R vs 146/88 L) and D-dimer 4,850 suggest **Stanford type B dissection**.
+**Bias**: [Anchoring](https://en.wikipedia.org/wiki/Anchoring_(cognitive_bias)) + [availability heuristic](https://en.wikipedia.org/wiki/Availability_heuristic) — common diagnosis assumed first.
+</td>
+<td width="33%" valign="top">
+### Case 3: Postpartum PE → "Anxiety"
+**🏷️ POSTPARTUM**
+29F, day 5 post C-section, dyspnea and tachycardia. Doctor orders **psychiatric consult**. SpO2 91%, ABG shows respiratory alkalosis — classic **pulmonary embolism**.
+**Bias**: [Premature closure](https://en.wikipedia.org/wiki/Premature_closure) + [framing effect](https://en.wikipedia.org/wiki/Framing_effect_(psychology)) — young woman = anxiety.
+</td>
+</tr>
+</table>
+> All cases are educational composites synthesized from published literature. See [`data/demo_cases/SOURCES.md`](data/demo_cases/SOURCES.md) for full citations.
+## Technical Details
+### Model Stack
+| Model | Parameters | Role | Loading |
+|:------|:----------|:-----|:--------|
+| [MedGemma 1.5 4B-IT](https://huggingface.co/google/medgemma-1.5-4b-it) | 4B | Multimodal image+text analysis | 4-bit quantized (~4GB VRAM) or BF16 (~8GB) |
+| [MedGemma 27B Text-IT](https://huggingface.co/google/medgemma-27b-text-it) | 27B | Advanced clinical reasoning | BF16 (~54GB VRAM), A100 only |
+| [MedSigLIP-448](https://huggingface.co/google/medsiglip-448) | 0.9B | Zero-shot sign verification | FP32 (~3GB VRAM) |
+| [MedASR](https://huggingface.co/google/medasr) | 105M | Medical speech-to-text | FP32 (~0.5GB VRAM) |
+### Hardware Profiles
+| Environment | GPU | Configuration | VRAM Usage |
+|:------------|:----|:-------------|:-----------|
+| **Local dev** | RTX 4070 12GB | 4B 4-bit + MedSigLIP + MedASR | ~7.5 GB |
+| **School HPC** | A100 80GB | 4B BF16 + **27B BF16** + MedSigLIP + MedASR | ~66 GB |
+| **HF Space** | T4 16GB | 4B 4-bit + MedSigLIP + MedASR | ~7.5 GB |
+| **Kaggle** | T4 16GB | 4B 4-bit + MedSigLIP | ~7 GB |
+All models load locally via [Transformers](https://huggingface.co/docs/transformers) with optional [4-bit quantization](https://huggingface.co/docs/bitsandbytes) — **zero API costs, fully offline-capable**.
+### Key Technical Decisions
+- **Blinded Diagnostician**: The first agent never sees the doctor's diagnosis. This prevents the AI from anchoring on the same conclusion, enabling genuine independent analysis.
+- **Dual-source analysis (imaging + clinical)**: All agents analyze both the medical image and the full clinical context (vitals, labs, risk factors). Each Diagnostician finding is tagged with its source (`imaging`, `clinical`, or `both`). This is critical because many must-not-miss diagnoses — aortic dissection (BP asymmetry), pulmonary embolism (low SpO2, elevated D-dimer) — may have subtle or absent imaging signs but glaring clinical red flags.
+- **Structured JSON output**: All agents output structured JSON parsed by [`json_repair`](https://github.com/mangiucugna/json_repair), which handles LLM output quirks (missing commas, truncation, markdown wrapping).
+- **Thinking token stripping**: MedGemma wraps internal reasoning in `<unused94>...<unused95>` tags ([model card](https://huggingface.co/google/medgemma-27b-text-it#thinking-mode)). These are stripped via regex before display.
+- **Adaptive model routing**: `generate_text()` automatically routes to 27B when `USE_27B=true`, else falls back to 4B. `generate_with_image()` always uses 4B (only model with vision).
+- **Collegial tone**: The Consultant is prompted to write as a consulting colleague, not a critic. Research shows physicians respond better to [collaborative challenge than confrontation](https://pubmed.ncbi.nlm.nih.gov/28493811/).
+- **Prompt Repetition**: All agents use the prompt repetition technique from [*"Prompt Repetition Improves Non-Reasoning LLMs"*](https://arxiv.org/abs/2512.14982) (Google Research, 2025). The user prompt is repeated with a transition phrase (`<query> Let me repeat the request: <query>`), which won **47 out of 70** benchmark-model combinations with **zero losses** — at nearly zero cost (only increases prefill tokens, no extra generation). Controllable via `ENABLE_PROMPT_REPETITION` env var.
+## Getting Started
+### Prerequisites
+- Python 3.11+
+- CUDA-capable GPU (12GB+ VRAM)
+- [Hugging Face account](https://huggingface.co) with access to gated models (MedGemma, MedSigLIP, MedASR)
+### Installation
+```bash
+# Clone the repository
+git clone https://huggingface.co/spaces/YOUR_USERNAME/diagnostic-devils-advocate
+cd diagnostic-devils-advocate
+# Install dependencies
+pip install -r requirements.txt
+# Login to Hugging Face (required for gated models)
+huggingface-cli login
+```
+### Running
+```bash
+# Standard launch (4B quantized, 12GB GPU)
+python app.py
+# With 27B reasoning model (A100 80GB required)
+USE_27B=true QUANTIZE_4B=false python app.py
+# Disable voice input
+ENABLE_MEDASR=false python app.py
+```
+The app launches at `http://localhost:7860`.
+### Environment Variables
+| Variable | Default | Description |
+|:---------|:--------|:------------|
+| `USE_27B` | `false` | Enable 27B model for text-only agents |
+| `QUANTIZE_4B` | `true` | 4-bit quantize the 4B model |
+| `ENABLE_MEDASR` | `true` | Enable voice input via MedASR |
+| `HF_TOKEN` | — | Hugging Face token (or use `huggingface-cli login`) |
+| `ENABLE_PROMPT_REPETITION` | `true` | [Prompt repetition](https://arxiv.org/abs/2512.14982) for improved output quality |
+| `MODEL_LOCAL_DIR` | — | Local directory for pre-downloaded models |
+| `DEVICE` | `cuda` | Compute device |
+## Project Structure
+```
+diagnostic-devils-advocate/
+├── app.py                        # Gradio entry point
+├── config.py                     # Model selection & environment config
+├── requirements.txt
+│
+├── agents/
+│   ├── state.py                  # LangGraph TypedDict state definitions
+│   ├── prompts.py                # All agent prompt templates
+│   ├── graph.py                  # LangGraph StateGraph pipeline
+│   ├── output_parser.py          # JSON parsing with json_repair
+│   ├── diagnostician.py          # Agent 1: Blinded image + clinical analysis
+│   ├── bias_detector.py          # Agent 2: Bias detection + MedSigLIP
+│   ├── devil_advocate.py         # Agent 3: Adversarial challenge
+│   └── consultant.py              # Agent 4: Consultation note synthesis
+│
+├── models/
+│   ├── medgemma_client.py        # MedGemma 4B/27B inference client
+│   ├── medsiglip_client.py       # MedSigLIP zero-shot classification
+│   ├── medasr_client.py          # MedASR speech-to-text
+│   └── utils.py                  # Image preprocessing, token stripping
+│
+├── ui/
+│   ├── components.py             # Gradio layout & progress visualization
+│   ├── callbacks.py              # UI event handlers & pipeline integration
+│   └── css.py                    # Custom styling (responsive design)
+│
+├── data/
+│   └── demo_cases/               # 3 composite clinical scenarios
+│       └── SOURCES.md            # Full literature citations
+│
+└── tests/
+    ├── test_smoke.py             # Import & build verification
+    ├── test_output_parser.py     # JSON repair tests
+    └── test_pipeline_mock.py     # Integration tests with mocked models
+```
+## Testing
+```bash
+python -m pytest tests/ -v
+```
+## Disclaimer
+> **This is a research prototype built for the MedGemma Impact Challenge. It is NOT intended for clinical decision-making.** All demo cases are educational composites. Medical images are sourced from the University of Saskatchewan Teaching Collection (CC-BY-NC-SA 4.0).
+## References
+- Singh H, et al. "The frequency of diagnostic errors in outpatient care." [*BMJ Quality & Safety*, 2014](https://qualitysafety.bmj.com/content/23/9/727)
+- Graber ML, et al. "Cognitive interventions to reduce diagnostic error." [*BMJ Quality & Safety*, 2012](https://qualitysafety.bmj.com/content/21/7/535)
+- Croskerry P. "The importance of cognitive errors in diagnosis." [*Academic Medicine*, 2003](https://pubmed.ncbi.nlm.nih.gov/12915371/)
+- Ball CG, et al. "Incidence, risk factors, and outcomes for occult pneumothoraces." [*J Trauma*, 2005](https://pubmed.ncbi.nlm.nih.gov/16374282/)
+- Hansen MS, et al. "Frequency of misdiagnosis of acute aortic dissection." [*Am J Cardiol*, 2007](https://pubmed.ncbi.nlm.nih.gov/17350380/)
+- Ivgi M, et al. "Prompt Repetition Improves Non-Reasoning LLMs." [*arXiv:2512.14982*](https://arxiv.org/abs/2512.14982), Google Research, 2025
+- Google Health AI. [Health AI Developer Foundations (HAI-DEF)](https://developers.google.com/health-ai)
+- Yang J, et al. [MedGemma: Medical AI model](https://huggingface.co/collections/google/health-ai-developer-foundations-68544906f8a0a10f7d30ade8) — Hugging Face Collection
+---
+<div align="center">
+Built with [Google Health AI Developer Foundations](https://developers.google.com/health-ai) for the [MedGemma Impact Challenge](https://www.kaggle.com/competitions/medgemma-impact-challenge)
+</div>

agents/__init__.py ADDED Viewed

File without changes

agents/bias_detector.py ADDED Viewed

	@@ -0,0 +1,135 @@

+"""
+Bias Detector agent: compares doctor's diagnosis with independent analysis to identify cognitive biases.
+Runs MedSigLIP sign verification on imaging findings mentioned by the Diagnostician.
+Outputs structured JSON.
+"""
+import re
+import logging
+from agents.state import PipelineState
+from agents.prompts import BIAS_DETECTOR_SYSTEM, BIAS_DETECTOR_USER
+from agents.output_parser import parse_json_response
+from models import medgemma_client, medsiglip_client
+logger = logging.getLogger(__name__)
+# Common imaging signs that SigLIP can meaningfully evaluate on chest X-ray.
+# These are visual patterns, not abstract diagnoses.
+_KNOWN_SIGNS = [
+    "pleural effusion", "consolidation", "infiltrates", "pneumothorax",
+    "widened mediastinum", "cardiomegaly", "pulmonary edema", "atelectasis",
+    "rib fracture", "subcutaneous emphysema", "hilar enlargement",
+    "hyperinflation", "pleural thickening", "lung opacity", "air bronchogram",
+    "mediastinal shift", "tracheal deviation", "cephalization",
+]
+def _extract_signs(findings: object) -> list[str]:
+    """Extract imaging signs mentioned in the Diagnostician's findings.
+    Matches against known radiological signs rather than parsing diagnoses.
+    """
+    if isinstance(findings, list):
+        chunks: list[str] = []
+        for item in findings:
+            if isinstance(item, dict):
+                chunks.append(str(item.get("finding", "")))
+                chunks.append(str(item.get("description", "")))
+            else:
+                chunks.append(str(item))
+        findings_text = "\n".join(chunks)
+    else:
+        findings_text = str(findings)
+    findings_lower = findings_text.lower()
+    found = []
+    for sign in _KNOWN_SIGNS:
+        if sign in findings_lower:
+            found.append(sign)
+    # Also extract any explicit "abnormal" findings with simple patterns
+    # e.g., "visible pleural line", "blunted costophrenic angle"
+    extra_patterns = [
+        r'(?:visible|subtle|small|large|bilateral|unilateral|left|right)\s+([\w\s]{5,30}?)(?:\.|,|;|\n)',
+    ]
+    for pat in extra_patterns:
+        for m in re.findall(pat, findings_lower):
+            cleaned = m.strip()
+            if cleaned not in found and len(cleaned) > 5:
+                found.append(cleaned)
+    # Deduplicate, limit to 8
+    seen = set()
+    unique = []
+    for s in found:
+        if s not in seen:
+            seen.add(s)
+            unique.append(s)
+    return unique[:8]
+def run(state: PipelineState) -> PipelineState:
+    """Run the Bias Detector agent."""
+    state["current_step"] = "bias_detector"
+    clinical = state["clinical_input"]
+    diag_out = state.get("diagnostician_output")
+    if diag_out is None:
+        state["error"] = "Diagnostician output missing."
+        return state
+    try:
+        # 1. MedSigLIP: verify imaging signs mentioned in findings
+        sign_verification = []
+        image = clinical.get("image")
+        if image is not None:
+            signs = _extract_signs(diag_out.get("findings_list") or diag_out.get("findings", ""))
+            logger.info("Extracted signs for SigLIP verification: %s", signs)
+            if signs:
+                sign_verification = medsiglip_client.verify_findings(
+                    image,
+                    signs,
+                    modality=clinical.get("modality"),
+                )
+        # 2. MedGemma: cognitive bias analysis (with image if available)
+        diagnostician_analysis = diag_out.get("analysis") or diag_out.get("findings", "")
+        prompt = BIAS_DETECTOR_USER.format(
+            doctor_diagnosis=clinical["doctor_diagnosis"],
+            clinical_context=clinical["clinical_context"],
+            diagnostician_findings=diagnostician_analysis,
+            consistency_check=_format_sign_verification(sign_verification),
+        )
+        if image is not None:
+            raw = medgemma_client.generate_with_image(prompt, image, system_prompt=BIAS_DETECTOR_SYSTEM)
+        else:
+            raw = medgemma_client.generate_text(prompt, system_prompt=BIAS_DETECTOR_SYSTEM)
+        parsed = parse_json_response(raw)
+        state["bias_detector_output"] = {
+            "identified_biases": parsed.get("identified_biases", []),
+            "discrepancy_summary": parsed.get("discrepancy_summary", ""),
+            "missed_findings": parsed.get("missed_findings", []),
+            "consistency_check": sign_verification,
+        }
+    except Exception as e:
+        logger.exception("Bias Detector agent failed")
+        state["error"] = f"Bias Detector error: {e}"
+    return state
+def _format_sign_verification(results: list[dict]) -> str:
+    """Format sign verification results as text for the MedGemma prompt."""
+    if not results:
+        return "No image verification available."
+    # Only include non-inconclusive results
+    meaningful = [r for r in results if r.get("confidence") != "inconclusive"]
+    if not meaningful:
+        return "Image verification inconclusive for all findings."
+    lines = ["Image sign verification (MedSigLIP):"]
+    for r in meaningful:
+        lines.append(f"- {r['sign']}: {r['confidence']}")
+    return "\n".join(lines)

agents/consultant.py ADDED Viewed

	@@ -0,0 +1,94 @@

+"""
+Consultant agent: synthesizes all upstream outputs into a collegial debiasing report.
+Outputs structured JSON.
+"""
+import json
+import logging
+from agents.state import PipelineState
+from agents.prompts import CONSULTANT_SYSTEM, CONSULTANT_USER
+from agents.output_parser import parse_json_response
+from models import medgemma_client
+logger = logging.getLogger(__name__)
+def _format_bias_report(bias_out: dict) -> str:
+    """Format bias detector output as text for the Consultant prompt."""
+    parts = []
+    if bias_out.get("discrepancy_summary"):
+        parts.append(f"Discrepancy: {bias_out['discrepancy_summary']}")
+    for b in bias_out.get("identified_biases", []):
+        parts.append(f"- [{b.get('severity','?').upper()}] {b.get('type','')}: {b.get('evidence','')}")
+    if bias_out.get("missed_findings"):
+        parts.append(f"Missed: {', '.join(bias_out['missed_findings'])}")
+    return "\n".join(parts) if parts else "No bias data."
+def _format_da_report(da_out: dict) -> str:
+    """Format devil's advocate output as text for the Consultant prompt."""
+    parts = []
+    for c in da_out.get("challenges", []):
+        parts.append(f"Challenge: {c.get('claim','')} → {c.get('counter_evidence','')}")
+    for m in da_out.get("must_not_miss", []):
+        parts.append(f"MUST-NOT-MISS: {m.get('diagnosis','')} — {m.get('why_dangerous','')}")
+    if da_out.get("recommended_workup"):
+        items = [str(w) if not isinstance(w, dict) else w.get("test", str(w)) for w in da_out["recommended_workup"]]
+        parts.append("Workup: " + ", ".join(items))
+    return "\n".join(parts) if parts else "No challenges raised."
+def run(state: PipelineState) -> PipelineState:
+    """Run the Consultant agent."""
+    state["current_step"] = "consultant"
+    clinical = state["clinical_input"]
+    diag_out = state.get("diagnostician_output")
+    bias_out = state.get("bias_detector_output")
+    da_out = state.get("devils_advocate_output")
+    if diag_out is None or bias_out is None or da_out is None:
+        state["error"] = "Missing upstream agent outputs."
+        return state
+    try:
+        diagnostician_analysis = diag_out.get("analysis") or diag_out.get("findings", "")
+        prompt = CONSULTANT_USER.format(
+            doctor_diagnosis=clinical["doctor_diagnosis"],
+            clinical_context=clinical["clinical_context"],
+            diagnostician_findings=diagnostician_analysis,
+            bias_report=_format_bias_report(bias_out),
+            devil_advocate_report=_format_da_report(da_out),
+            similar_cases="Not available.",
+        )
+        raw = medgemma_client.generate_text(prompt, system_prompt=CONSULTANT_SYSTEM)
+        parsed = parse_json_response(raw)
+        alternative_diagnoses = parsed.get("alternative_diagnoses", [])
+        if isinstance(alternative_diagnoses, str):
+            try:
+                alternative_diagnoses = json.loads(alternative_diagnoses)
+            except json.JSONDecodeError:
+                alternative_diagnoses = []
+        if not isinstance(alternative_diagnoses, list):
+            alternative_diagnoses = []
+        immediate_actions = parsed.get("immediate_actions", [])
+        if isinstance(immediate_actions, str):
+            immediate_actions = [immediate_actions]
+        if not isinstance(immediate_actions, list):
+            immediate_actions = []
+        immediate_actions = [str(x).strip() for x in immediate_actions if str(x).strip()]
+        state["consultant_output"] = {
+            "consultation_note": parsed.get("consultation_note", ""),
+            "alternative_diagnoses": alternative_diagnoses,
+            "immediate_actions": immediate_actions,
+            "confidence_note": parsed.get("confidence_note", ""),
+        }
+    except Exception as e:
+        logger.exception("Consultant agent failed")
+        state["error"] = f"Consultant error: {e}"
+    return state

agents/devil_advocate.py ADDED Viewed

	@@ -0,0 +1,313 @@

+"""
+Devil's Advocate agent: adversarial challenge to the working diagnosis.
+Deliberately contrarian — focuses on must-not-miss diagnoses.
+Uses MedGemma 4B (multimodal) to independently examine the image.
+Outputs structured JSON.
+"""
+import json
+import logging
+from collections.abc import Mapping
+from agents.state import PipelineState
+from agents.prompts import DEVIL_ADVOCATE_SYSTEM, DEVIL_ADVOCATE_USER
+from agents.output_parser import parse_json_response
+from models import medgemma_client
+logger = logging.getLogger(__name__)
+_DA_SCHEMA_KEYS = ("challenges", "must_not_miss", "recommended_workup")
+_DA_WRAPPER_KEYS = (
+    "devils_advocate_output",
+    "devil_advocate_output",
+    "devil_advocate",
+    "output",
+    "response",
+    "result",
+    "data",
+)
+_DA_SYNONYMS: dict[str, str] = {
+    # must-not-miss
+    "must_not_miss_diagnoses": "must_not_miss",
+    "must_not_miss_differentials": "must_not_miss",
+    "dangerous_alternatives": "must_not_miss",
+    "critical_differentials": "must_not_miss",
+    # workup
+    "workup": "recommended_workup",
+    "recommended_tests": "recommended_workup",
+    "recommended_actions": "recommended_workup",
+    "next_steps": "recommended_workup",
+    # challenges
+    "challenge": "challenges",
+    "concerns": "challenges",
+    "counterarguments": "challenges",
+}
+def _format_bias_summary(bias_out: dict) -> str:
+    """Format bias detector output for the Devil's Advocate prompt."""
+    parts = []
+    if bias_out.get("discrepancy_summary"):
+        parts.append(bias_out["discrepancy_summary"])
+    for b in bias_out.get("identified_biases", []):
+        parts.append(f"- {b.get('type', 'unknown')}: {b.get('evidence', '')} (severity: {b.get('severity', '?')})")
+    if bias_out.get("missed_findings"):
+        parts.append("Missed findings: " + ", ".join(bias_out["missed_findings"]))
+    return "\n".join(parts) if parts else "No bias analysis available."
+def _unwrap_da_payload(parsed: dict) -> dict:
+    """Unwrap common container shapes: {"output": {...}}, {"result": {...}}, etc."""
+    if any(k in parsed for k in _DA_SCHEMA_KEYS):
+        return parsed
+    for key in _DA_WRAPPER_KEYS:
+        inner = parsed.get(key)
+        if isinstance(inner, Mapping) and any(k in inner for k in _DA_SCHEMA_KEYS):
+            return dict(inner)
+    # If there's a single nested object, unwrap it if it contains DA keys.
+    if len(parsed) == 1:
+        only_value = next(iter(parsed.values()))
+        if isinstance(only_value, Mapping) and any(k in only_value for k in _DA_SCHEMA_KEYS):
+            return dict(only_value)
+    # One-level scan for any nested object that contains DA keys.
+    for value in parsed.values():
+        if isinstance(value, Mapping) and any(k in value for k in _DA_SCHEMA_KEYS):
+            return dict(value)
+    return parsed
+def _coerce_da_schema(parsed: dict) -> dict:
+    """Best-effort normalization when the model returns an unexpected top-level JSON shape."""
+    if not isinstance(parsed, dict):
+        return {}
+    parsed = _unwrap_da_payload(parsed)
+    if not isinstance(parsed, dict):
+        return {}
+    # Map common synonym keys onto the expected schema.
+    coerced = dict(parsed)
+    for src, dst in _DA_SYNONYMS.items():
+        if src in coerced and dst not in coerced:
+            coerced[dst] = coerced[src]
+    if any(k in coerced for k in _DA_SCHEMA_KEYS):
+        return coerced
+    items = coerced.get("items")
+    if not isinstance(items, list) or not items:
+        return coerced
+    # If the model returned just a list of strings, treat it as a workup list.
+    if all(isinstance(x, str) for x in items):
+        return {"recommended_workup": items}
+    dict_items = [x for x in items if isinstance(x, dict)]
+    if len(dict_items) != len(items):
+        return parsed
+    keys: set[str] = set()
+    for d in dict_items[:5]:
+        keys.update(d.keys())
+    if "claim" in keys or "counter_evidence" in keys:
+        return {"challenges": dict_items}
+    if {"why_dangerous", "supporting_signs", "rule_out_test"} & keys or "diagnosis" in keys:
+        return {"must_not_miss": dict_items}
+    return coerced
+def _normalize_challenges(value: object) -> list[dict[str, str]]:
+    if value is None:
+        return []
+    items = [value] if isinstance(value, Mapping) else value
+    if isinstance(items, str):
+        s = items.strip()
+        return [{"claim": s, "counter_evidence": ""}] if s else []
+    if not isinstance(items, list):
+        return []
+    out: list[dict[str, str]] = []
+    for item in items:
+        if item is None:
+            continue
+        if isinstance(item, Mapping):
+            d = dict(item)
+            claim = str(d.get("claim") or d.get("challenge") or d.get("concern") or "").strip()
+            counter = str(
+                d.get("counter_evidence")
+                or d.get("counterevidence")
+                or d.get("counter_argument")
+                or d.get("counterargument")
+                or d.get("counter")
+                or d.get("evidence_against")
+                or ""
+            ).strip()
+            if claim or counter:
+                out.append({"claim": claim, "counter_evidence": counter})
+            continue
+        s = str(item).strip()
+        if s:
+            out.append({"claim": s, "counter_evidence": ""})
+    return out
+def _normalize_must_not_miss(value: object) -> list[dict[str, str]]:
+    if value is None:
+        return []
+    items = [value] if isinstance(value, Mapping) else value
+    if isinstance(items, str):
+        s = items.strip()
+        return [{"diagnosis": s}] if s else []
+    if not isinstance(items, list):
+        return []
+    out: list[dict[str, str]] = []
+    for item in items:
+        if item is None:
+            continue
+        if isinstance(item, Mapping):
+            d = dict(item)
+            diagnosis = str(d.get("diagnosis") or d.get("dx") or d.get("differential") or "").strip()
+            why = str(d.get("why_dangerous") or d.get("why") or d.get("danger") or "").strip()
+            signs = str(d.get("supporting_signs") or d.get("evidence") or d.get("support") or "").strip()
+            test = str(d.get("rule_out_test") or d.get("test") or d.get("rule_out") or "").strip()
+            if diagnosis or why or signs or test:
+                out.append(
+                    {
+                        "diagnosis": diagnosis,
+                        "why_dangerous": why,
+                        "supporting_signs": signs,
+                        "rule_out_test": test,
+                    }
+                )
+            continue
+        s = str(item).strip()
+        if s:
+            out.append({"diagnosis": s})
+    return out
+def run(state: PipelineState) -> PipelineState:
+    """Run the Devil's Advocate agent."""
+    state["current_step"] = "devil_advocate"
+    clinical = state["clinical_input"]
+    diag_out = state.get("diagnostician_output")
+    bias_out = state.get("bias_detector_output")
+    image = clinical.get("image")
+    if diag_out is None or bias_out is None:
+        state["error"] = "Missing upstream agent outputs."
+        return state
+    if image is None:
+        state["error"] = "No image provided for Devil's Advocate."
+        return state
+    try:
+        diagnostician_analysis = diag_out.get("analysis") or diag_out.get("findings", "")
+        prompt = DEVIL_ADVOCATE_USER.format(
+            doctor_diagnosis=clinical["doctor_diagnosis"],
+            clinical_context=clinical["clinical_context"],
+            diagnostician_findings=diagnostician_analysis,
+            bias_summary=_format_bias_summary(bias_out),
+        )
+        system_prompt = DEVIL_ADVOCATE_SYSTEM
+        raw = medgemma_client.generate_with_image(prompt, image, system_prompt=system_prompt)
+        parsed = _coerce_da_schema(parse_json_response(raw))
+        challenges = _normalize_challenges(parsed.get("challenges"))
+        must_not_miss = _normalize_must_not_miss(parsed.get("must_not_miss"))
+        workup_raw = parsed.get("recommended_workup", [])
+        normalized_workup: list[str] = []
+        if isinstance(workup_raw, str):
+            # Split a single workup string into bullet-like entries.
+            workup_raw = [x.strip(" -\t") for x in workup_raw.replace(";", "\n").splitlines()]
+        if isinstance(workup_raw, Mapping):
+            workup_raw = [dict(workup_raw)]
+        if isinstance(workup_raw, list):
+            for item in workup_raw:
+                if item is None:
+                    continue
+                if isinstance(item, str):
+                    s = item.strip()
+                elif isinstance(item, dict):
+                    s = str(
+                        item.get("test")
+                        or item.get("name")
+                        or item.get("action")
+                        or item.get("workup")
+                        or ""
+                    ).strip()
+                    if not s:
+                        s = json.dumps(item, ensure_ascii=False)
+                else:
+                    s = str(item).strip()
+                if s:
+                    normalized_workup.append(s)
+        # Deduplicate while preserving order.
+        normalized_workup = list(dict.fromkeys(normalized_workup))
+        # If the model returned an empty schema, retry once with a stricter instruction.
+        if not (challenges or must_not_miss or normalized_workup):
+            logger.warning("Devil's Advocate produced empty structured output; retrying once.")
+            strict_system = (
+                DEVIL_ADVOCATE_SYSTEM
+                + "\n\nIMPORTANT: Do not return empty arrays. Provide at least 1 item in each list, "
+                + "even if you must express uncertainty and suggest rule-out testing."
+            )
+            raw_retry = medgemma_client.generate_with_image(prompt, image, system_prompt=strict_system)
+            parsed_retry = _coerce_da_schema(parse_json_response(raw_retry))
+            challenges = _normalize_challenges(parsed_retry.get("challenges"))
+            must_not_miss = _normalize_must_not_miss(parsed_retry.get("must_not_miss"))
+            workup_retry = parsed_retry.get("recommended_workup", [])
+            normalized_workup = []
+            if isinstance(workup_retry, str):
+                workup_retry = [x.strip(" -\t") for x in workup_retry.replace(";", "\n").splitlines()]
+            if isinstance(workup_retry, Mapping):
+                workup_retry = [dict(workup_retry)]
+            if isinstance(workup_retry, list):
+                for item in workup_retry:
+                    if item is None:
+                        continue
+                    if isinstance(item, str):
+                        s = item.strip()
+                    elif isinstance(item, dict):
+                        s = str(
+                            item.get("test")
+                            or item.get("name")
+                            or item.get("action")
+                            or item.get("workup")
+                            or ""
+                        ).strip()
+                        if not s:
+                            s = json.dumps(item, ensure_ascii=False)
+                    else:
+                        s = str(item).strip()
+                    if s:
+                        normalized_workup.append(s)
+            normalized_workup = list(dict.fromkeys(normalized_workup))
+        state["devils_advocate_output"] = {
+            "challenges": challenges,
+            "must_not_miss": must_not_miss,
+            "recommended_workup": normalized_workup,
+        }
+    except Exception as e:
+        logger.exception("Devil's Advocate agent failed")
+        state["error"] = f"Devil's Advocate error: {e}"
+    return state

agents/diagnostician.py ADDED Viewed

	@@ -0,0 +1,91 @@

+"""
+Diagnostician agent: independent image analysis WITHOUT seeing the doctor's diagnosis.
+Uses MedGemma 4B (multimodal) for detailed radiological analysis.
+Outputs structured JSON.
+"""
+import logging
+from agents.state import PipelineState
+from agents.prompts import DIAGNOSTICIAN_SYSTEM, DIAGNOSTICIAN_USER
+from agents.output_parser import parse_json_response
+from models import medgemma_client
+logger = logging.getLogger(__name__)
+def run(state: PipelineState) -> PipelineState:
+    """Run the Diagnostician agent."""
+    state["current_step"] = "diagnostician"
+    clinical = state["clinical_input"]
+    image = clinical.get("image")
+    if image is None:
+        state["error"] = "No image provided."
+        return state
+    try:
+        prompt = DIAGNOSTICIAN_USER.format(clinical_context=clinical["clinical_context"])
+        raw = medgemma_client.generate_with_image(prompt, image, system_prompt=DIAGNOSTICIAN_SYSTEM)
+        parsed = parse_json_response(raw)
+        findings = parsed.get("findings", [])
+        differentials = parsed.get("differential_diagnoses", [])
+        if not isinstance(findings, list):
+            findings = [findings] if findings else []
+        if not isinstance(differentials, list):
+            differentials = [differentials] if differentials else []
+        findings_lines: list[str] = []
+        for f in findings:
+            if isinstance(f, dict):
+                name = str(f.get("finding", "")).strip()
+                desc = str(f.get("description", "")).strip()
+                source = str(f.get("source", "")).strip()
+                source_tag = f" [{source}]" if source else ""
+                if name and desc:
+                    findings_lines.append(f"- {name}{source_tag}: {desc}")
+                elif name:
+                    findings_lines.append(f"- {name}{source_tag}")
+                elif desc:
+                    findings_lines.append(f"- {desc}")
+            else:
+                s = str(f).strip()
+                if s:
+                    findings_lines.append(f"- {s}")
+        differential_lines: list[str] = []
+        for d in differentials:
+            if isinstance(d, dict):
+                name = str(d.get("diagnosis", "")).strip()
+                reasoning = str(d.get("reasoning", "")).strip()
+                if name and reasoning:
+                    differential_lines.append(f"- {name}: {reasoning}")
+                elif name:
+                    differential_lines.append(f"- {name}")
+                elif reasoning:
+                    differential_lines.append(f"- {reasoning}")
+            else:
+                s = str(d).strip()
+                if s:
+                    differential_lines.append(f"- {s}")
+        findings_text = "\n".join(findings_lines)
+        differentials_text = "\n".join(differential_lines)
+        analysis_parts: list[str] = []
+        if findings_text:
+            analysis_parts.append("Findings:\n" + findings_text)
+        if differentials_text:
+            analysis_parts.append("Differential diagnoses:\n" + differentials_text)
+        analysis_text = "\n\n".join(analysis_parts).strip()
+        state["diagnostician_output"] = {
+            "analysis": analysis_text,
+            "findings": findings_text,
+            "findings_list": findings,
+            "differential_diagnoses": differentials,
+            "differentials_text": differentials_text,
+        }
+    except Exception as e:
+        logger.exception("Diagnostician agent failed")
+        state["error"] = f"Diagnostician error: {e}"
+    return state

agents/graph.py ADDED Viewed

	@@ -0,0 +1,145 @@

+"""
+LangGraph pipeline: linear flow through 4 diagnostic agents.
+START → diagnostician → bias_detector → devil_advocate → consultant → END
+"""
+import logging
+import threading
+from agents.state import PipelineState
+from agents import diagnostician, bias_detector, devil_advocate, consultant
+logger = logging.getLogger(__name__)
+try:
+    from langgraph.graph import StateGraph, START, END
+    _LANGGRAPH_AVAILABLE = True
+except ModuleNotFoundError:
+    StateGraph = None  # type: ignore[assignment]
+    START = END = None
+    _LANGGRAPH_AVAILABLE = False
+def _check_error(state: PipelineState) -> str:
+    """Route to END if an error occurred, otherwise continue."""
+    if state.get("error"):
+        return "end"
+    return "continue"
+class _FallbackGraph:
+    def invoke(self, initial_state: PipelineState) -> PipelineState:
+        state = initial_state
+        for fn in (diagnostician.run, bias_detector.run, devil_advocate.run, consultant.run):
+            state = fn(state)
+            if state.get("error"):
+                break
+        return state
+    def stream(self, initial_state: PipelineState, stream_mode: str = "updates"):
+        state = initial_state
+        for name, fn in (
+            ("diagnostician", diagnostician.run),
+            ("bias_detector", bias_detector.run),
+            ("devil_advocate", devil_advocate.run),
+            ("consultant", consultant.run),
+        ):
+            state = fn(state)
+            yield {name: dict(state)}
+            if state.get("error"):
+                break
+def build_graph():
+    """Build and compile the diagnostic debiasing pipeline."""
+    if not _LANGGRAPH_AVAILABLE:
+        logger.warning("langgraph is not installed; falling back to a simple sequential pipeline.")
+        return _FallbackGraph()
+    graph = StateGraph(PipelineState)
+    # Add nodes
+    graph.add_node("diagnostician", diagnostician.run)
+    graph.add_node("bias_detector", bias_detector.run)
+    graph.add_node("devil_advocate", devil_advocate.run)
+    graph.add_node("consultant", consultant.run)
+    # Linear flow with error checking
+    graph.add_edge(START, "diagnostician")
+    graph.add_conditional_edges("diagnostician", _check_error, {"continue": "bias_detector", "end": END})
+    graph.add_conditional_edges("bias_detector", _check_error, {"continue": "devil_advocate", "end": END})
+    graph.add_conditional_edges("devil_advocate", _check_error, {"continue": "consultant", "end": END})
+    graph.add_edge("consultant", END)
+    return graph.compile()
+# Singleton compiled graph
+_compiled_graph = None
+_compiled_graph_lock = threading.Lock()
+def get_graph():
+    """Get or create the compiled pipeline graph."""
+    global _compiled_graph
+    if _compiled_graph is not None:
+        return _compiled_graph
+    with _compiled_graph_lock:
+        if _compiled_graph is None:
+            _compiled_graph = build_graph()
+    return _compiled_graph
+def _make_initial_state(
+    image,
+    doctor_diagnosis: str,
+    clinical_context: str,
+    modality: str | None = None,
+) -> PipelineState:
+    return {
+        "clinical_input": {
+            "image": image,
+            "doctor_diagnosis": doctor_diagnosis,
+            "clinical_context": clinical_context,
+            "modality": modality or "CXR",
+        },
+        "diagnostician_output": None,
+        "bias_detector_output": None,
+        "devils_advocate_output": None,
+        "consultant_output": None,
+        "current_step": "start",
+        "error": None,
+    }
+def run_pipeline(
+    image,
+    doctor_diagnosis: str,
+    clinical_context: str,
+    modality: str | None = None,
+) -> PipelineState:
+    """Run the full debiasing pipeline (blocking)."""
+    graph = get_graph()
+    initial_state = _make_initial_state(image, doctor_diagnosis, clinical_context, modality=modality)
+    return graph.invoke(initial_state)
+def stream_pipeline(
+    image,
+    doctor_diagnosis: str,
+    clinical_context: str,
+    modality: str | None = None,
+):
+    """
+    Stream the pipeline, yielding (node_name, state) after each agent completes.
+    Use this for progressive UI updates.
+    """
+    graph = get_graph()
+    initial_state = _make_initial_state(image, doctor_diagnosis, clinical_context, modality=modality)
+    for event in graph.stream(initial_state, stream_mode="updates"):
+        # event is {node_name: state_update}
+        for node_name, state_update in event.items():
+            yield node_name, state_update

agents/output_parser.py ADDED Viewed

	@@ -0,0 +1,124 @@

+"""
+JSON output parser for LLM responses.
+Uses json_repair to handle malformed JSON (missing commas, truncation, extra text, etc.).
+"""
+import logging
+from collections.abc import Mapping
+from json_repair import repair_json
+logger = logging.getLogger(__name__)
+_TOP_LEVEL_KEYS = {
+    # Diagnostician
+    "findings",
+    "differential_diagnoses",
+    # Bias detector
+    "discrepancy_summary",
+    "identified_biases",
+    "missed_findings",
+    "agreement_points",
+    # Devil's advocate
+    "challenges",
+    "must_not_miss",
+    "recommended_workup",
+    # Consultant
+    "consultation_note",
+    "alternative_diagnoses",
+    "immediate_actions",
+    "confidence_note",
+}
+def parse_json_response(text: str) -> dict:
+    """
+    Extract and repair JSON from an LLM response.
+    Handles: raw JSON, ```json blocks, missing commas, truncated output, etc.
+    Returns parsed dict. Raises ValueError if repair fails completely.
+    """
+    result = repair_json(text, return_objects=True)
+    # Typical (desired) case: top-level object.
+    if isinstance(result, Mapping):
+        return dict(result)
+    # Some model outputs come back as a top-level array. Coerce to a dict so
+    # downstream code can continue, while preserving the payload for callers to
+    # interpret (via 'items') when schema keys are missing.
+    if isinstance(result, list):
+        return _coerce_list_root(result)
+    raise ValueError(
+        f"Could not parse JSON from LLM output (got {type(result).__name__}, length={len(text)})"
+    )
+def _coerce_list_root(items: list) -> dict:
+    if not items:
+        return {"items": []}
+    mapping_items = [x for x in items if isinstance(x, Mapping)]
+    if not mapping_items:
+        return {"items": items}
+    merged: dict = {}
+    contains_top_level_key = False
+    for m in mapping_items:
+        d = dict(m)
+        contains_top_level_key = contains_top_level_key or bool(_TOP_LEVEL_KEYS.intersection(d.keys()))
+        merged.update(d)
+    # If the extracted objects already contain known top-level schema keys, it's
+    # likely a wrapped/duplicated object (or multiple partial objects). Merge.
+    if contains_top_level_key:
+        return merged
+    all_mappings = len(mapping_items) == len(items)
+    if all_mappings:
+        # Distinguish between (a) a true list of repeated schema items, vs (b)
+        # multiple standalone JSON objects extracted from a noisy response.
+        key_sets = [set(dict(m).keys()) for m in mapping_items[:10]]
+        union = set().union(*key_sets)
+        intersection = set(key_sets[0]).intersection(*key_sets[1:]) if len(key_sets) > 1 else set(key_sets[0])
+        overlap_ratio = (len(intersection) / len(union)) if union else 0.0
+        if len(items) == 1 or overlap_ratio >= 0.35:
+            inferred_key = _infer_list_container_key(mapping_items)
+            if inferred_key:
+                return {inferred_key: [dict(m) for m in mapping_items]}
+            return {"items": [dict(m) for m in mapping_items]}
+        # Low overlap between objects: treat as multiple extracted JSON objects.
+        return merged
+    # Mixed list: preserve non-mapping items, but coerce mappings to dict.
+    coerced = [dict(x) if isinstance(x, Mapping) else x for x in items]
+    return {"items": coerced}
+def _infer_list_container_key(items: list[Mapping]) -> str | None:
+    keys: set[str] = set()
+    for item in items[:5]:
+        keys.update(str(k) for k in item.keys())
+    # Diagnostician
+    if {"finding", "description"} & keys:
+        return "findings"
+    if "reasoning" in keys:
+        return "differential_diagnoses"
+    # Bias detector
+    if {"type", "severity"} <= keys or ("type" in keys and "severity" in keys):
+        return "identified_biases"
+    # Devil's advocate
+    if "claim" in keys or "counter_evidence" in keys:
+        return "challenges"
+    if {"why_dangerous", "rule_out_test", "supporting_signs"} & keys:
+        return "must_not_miss"
+    # Consultant
+    if {"urgency", "next_step"} & keys:
+        return "alternative_diagnoses"
+    return None

agents/prompts.py ADDED Viewed

	@@ -0,0 +1,142 @@

+"""
+Prompt templates for each agent in the debiasing pipeline.
+All downstream agents (Bias Detector, Devil's Advocate, Consultant) use JSON output format.
+"""
+# ---------------------------------------------------------------------------
+# Diagnostician: independent image analysis (MUST NOT see doctor's diagnosis)
+# ---------------------------------------------------------------------------
+DIAGNOSTICIAN_SYSTEM = """\
+You are a radiologist performing an independent case review. Analyze BOTH the medical image AND the clinical context (history, vitals, labs, exam findings). Do not assume any prior diagnosis.
+Some dangerous conditions may show subtle or no imaging signs but have obvious clinical red flags — you must catch these.
+Respond with valid JSON only — no markdown, no text outside the JSON.
+Top-level JSON must be a single object (not an array)."""
+DIAGNOSTICIAN_USER = """\
+Patient clinical context: {clinical_context}
+Analyze this medical image together with the clinical context above. Report ALL findings — both imaging findings and clinical red flags from the context (abnormal vitals, labs, risk factors). Respond with JSON:
+{{
+  "findings": [
+    {{
+      "finding": "name of finding",
+      "source": "imaging | clinical | both",
+      "description": "location/appearance for imaging findings, or value/significance for clinical findings"
+    }}
+  ],
+  "differential_diagnoses": [
+    {{
+      "diagnosis": "diagnosis name",
+      "reasoning": "combined evidence from imaging AND clinical context"
+    }}
+  ]
+}}"""
+# ---------------------------------------------------------------------------
+# Bias Detector: compare doctor's diagnosis with independent analysis
+# Output: structured JSON
+# ---------------------------------------------------------------------------
+BIAS_DETECTOR_SYSTEM = """\
+You are a clinical reasoning expert specializing in cognitive bias detection. You have direct access to the medical image AND the full clinical context (history, vitals, labs, exam findings).
+You are given two independent assessments of the same case: the treating physician's diagnosis and an AI-generated analysis. Neither is assumed to be correct — both may contain errors or omissions.
+Examine the image yourself AND carefully review the clinical context. Compare both assessments against what you see in the image AND what the clinical data shows. Some dangerous conditions have subtle imaging but obvious clinical red flags — flag these if either assessment ignored them.
+Respond with valid JSON only — no markdown, no text outside the JSON.
+Top-level JSON must be a single object (not an array)."""
+BIAS_DETECTOR_USER = """\
+Doctor's diagnosis: "{doctor_diagnosis}"
+Clinical context: {clinical_context}
+AI independent analysis (blinded, may also contain errors): {diagnostician_findings}
+Image–diagnosis consistency (MedSigLIP verification): {consistency_check}
+Compare both assessments objectively. Neither is assumed correct. Respond with JSON:
+{{
+  "discrepancy_summary": "how the two assessments differ — note which points are uncertain",
+  "identified_biases": [
+    {{
+      "source": "doctor | AI | both",
+      "type": "bias type",
+      "evidence": "why you suspect this bias",
+      "severity": "choose from LOW | MEDIUM | HIGH"
+    }}
+  ],
+  "missed_findings": ["finding not accounted for by either assessment"],
+  "agreement_points": ["findings where both agree"]
+}}"""
+# ---------------------------------------------------------------------------
+# Devil's Advocate: adversarial challenge (deliberately contrarian)
+# Output: structured JSON
+# ---------------------------------------------------------------------------
+DEVIL_ADVOCATE_SYSTEM = """\
+You are a Devil's Advocate in a clinical case review. You have direct access to the medical image AND the full clinical context.
+Your sole purpose is to challenge the working diagnosis — especially for dangerous must-not-miss diagnoses.
+Examine the image yourself AND scrutinize the clinical data (vitals, labs, risk factors). Many must-not-miss diagnoses have subtle imaging but glaring clinical signs — use both sources of evidence.
+Do not simply repeat earlier findings — look for anything that may have been overlooked.
+Respond with valid JSON only — no markdown, no text outside the JSON.
+Top-level JSON must be a single object (not an array)."""
+DEVIL_ADVOCATE_USER = """\
+Working diagnosis: "{doctor_diagnosis}"
+Clinical context: {clinical_context}
+Prior independent analysis (for reference only — form your own opinion from the image and clinical data): {diagnostician_findings}
+Detected biases: {bias_summary}
+Examine the attached medical image AND the clinical context. Challenge the working diagnosis using evidence from both imaging and clinical data.
+IMPORTANT: Do NOT return empty lists — provide at least 1 item in each list. If evidence is weak, state uncertainty and suggest a rule-out test.
+Respond with JSON:
+{{
+  "challenges": [
+    {{
+      "claim": "aspect being challenged",
+      "counter_evidence": "why it may be wrong"
+    }}
+  ],
+  "must_not_miss": [
+    {{
+      "diagnosis": "dangerous alternative",
+      "why_dangerous": "consequence if missed",
+      "supporting_signs": "evidence from this case",
+      "rule_out_test": "best test to confirm or exclude"
+    }}
+  ],
+  "recommended_workup": ["test 1", "test 2"]
+}}"""
+# ---------------------------------------------------------------------------
+# Consultant: synthesize debiasing report
+# Output: structured JSON
+# ---------------------------------------------------------------------------
+CONSULTANT_SYSTEM = """\
+You are a senior clinician writing a consultation note. Your reader is "you". The sick person is "the patient".
+Tone: collegial, direct — "Have you considered..." style.
+Never mention cognitive bias names. Never use brackets or placeholders.
+Respond with valid JSON only — no markdown, no text outside the JSON.
+Top-level JSON must be a single object (not an array)."""
+CONSULTANT_USER = """\
+Original diagnosis: "{doctor_diagnosis}"
+Clinical context: {clinical_context}
+Independent analysis: {diagnostician_findings}
+Bias analysis: {bias_report}
+Devil's advocate challenges: {devil_advocate_report}
+Similar cases: {similar_cases}
+Write a 2-4 paragraph consultation note. Call the reader "you" and the sick person "the patient". Start the note directly with clinical content (e.g., "I reviewed the imaging and..."). Respond with JSON:
+{{
+  "consultation_note": "2-4 paragraphs. Address the reader as you. Call the sick person the patient. Start directly with clinical content.",
+  "alternative_diagnoses": [
+    {{
+      "diagnosis": "name",
+      "urgency": "MUST be one of: critical, high, moderate",
+      "evidence": "supporting evidence from this case",
+      "next_step": "specific action to confirm or rule out"
+    }}
+  ],
+  "immediate_actions": ["concrete next step 1", "step 2"],
+  "confidence_note": "confidence level and limitations"
+}}"""

agents/state.py ADDED Viewed

	@@ -0,0 +1,80 @@

+"""
+LangGraph state definition for the Diagnostic Devil's Advocate pipeline.
+"""
+from typing import Any, Optional
+from typing_extensions import NotRequired, TypedDict
+from PIL import Image
+class ClinicalInput(TypedDict):
+    """Raw input from the user."""
+    image: Optional[Image.Image]
+    doctor_diagnosis: str
+    clinical_context: str  # age, sex, symptoms, history, etc.
+    modality: NotRequired[str]  # "CXR" | "CT" | "Other"
+class Finding(TypedDict, total=False):
+    finding: str
+    description: str
+class DifferentialDiagnosis(TypedDict, total=False):
+    diagnosis: str
+    reasoning: str
+class DiagnosticianOutput(TypedDict):
+    """Independent analysis from the Diagnostician agent (does NOT see doctor's diagnosis)."""
+    analysis: str  # formatted text for downstream agents
+    findings: str  # findings-only text
+    findings_list: list[Finding]  # structured findings
+    differential_diagnoses: list[DifferentialDiagnosis]  # top differentials
+    differentials_text: NotRequired[str]
+class BiasDetectorOutput(TypedDict):
+    """Bias analysis comparing doctor's diagnosis vs independent analysis."""
+    identified_biases: list[dict[str, Any]]  # [{"type": str, "evidence": str, "severity": str}]
+    discrepancy_summary: str
+    missed_findings: list[str]
+    consistency_check: list[dict[str, Any]]  # MedSigLIP sign verification results
+class DevilsAdvocateOutput(TypedDict):
+    """Adversarial challenge to the working diagnosis."""
+    challenges: list[dict[str, Any]]  # [{"claim": str, "counter_evidence": str}]
+    must_not_miss: list[dict[str, Any]]  # [{"diagnosis": str, "why_dangerous": str, "supporting_signs": str}]
+    recommended_workup: list[str]
+class AlternativeDiagnosis(TypedDict, total=False):
+    diagnosis: str
+    urgency: str  # "critical" | "high" | "moderate"
+    evidence: str
+    next_step: str
+class ConsultantOutput(TypedDict):
+    """Final synthesized consultation note."""
+    consultation_note: str
+    alternative_diagnoses: list[AlternativeDiagnosis]
+    immediate_actions: list[str]
+    confidence_note: str
+class PipelineState(TypedDict):
+    """Full state passed through the LangGraph pipeline."""
+    # Input
+    clinical_input: ClinicalInput
+    # Agent outputs (populated as pipeline progresses)
+    diagnostician_output: Optional[DiagnosticianOutput]
+    bias_detector_output: Optional[BiasDetectorOutput]
+    devils_advocate_output: Optional[DevilsAdvocateOutput]
+    consultant_output: Optional[ConsultantOutput]
+    # Metadata
+    current_step: str
+    error: Optional[str]

app.py ADDED Viewed

	@@ -0,0 +1,41 @@

+"""
+Diagnostic Devil's Advocate — Main entry point.
+A multi-agent AI system that challenges clinical diagnoses to prevent cognitive bias errors.
+"""
+import logging
+import sys
+import os
+# Add project root to path for imports
+sys.path.insert(0, os.path.dirname(__file__))
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s [%(name)s] %(levelname)s: %(message)s",
+)
+import gradio as gr  # noqa: E402
+from config import ENABLE_MEDASR  # noqa: E402
+from ui.components import build_ui  # noqa: E402
+from ui.callbacks import analyze_streaming, load_demo, transcribe_audio  # noqa: E402
+from ui.css import CUSTOM_CSS  # noqa: E402
+def main():
+    demo = build_ui(
+        analyze_fn=analyze_streaming,
+        load_demo_fn=load_demo,
+        transcribe_fn=transcribe_audio if ENABLE_MEDASR else None,
+    )
+    demo.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        css=CUSTOM_CSS,
+        theme=gr.themes.Soft(),
+    )
+if __name__ == "__main__":
+    main()

config.py ADDED Viewed

	@@ -0,0 +1,73 @@

+"""
+Configuration for Diagnostic Devil's Advocate.
+Controls model loading, quantization, and environment-specific settings.
+Model loading priority:
+  1. Local path (MODEL_LOCAL_DIR env var) — fully offline
+  2. HF cache (auto-downloaded via huggingface-cli download) — offline after first download
+  3. HF Hub (requires HF_TOKEN for gated models) — online fallback
+"""
+import os
+from huggingface_hub import try_to_load_from_cache
+# --- Model Selection ---
+USE_27B = os.environ.get("USE_27B", "false").lower() == "true"
+QUANTIZE_4B = os.environ.get("QUANTIZE_4B", "true").lower() == "true"
+ENABLE_MEDASR = os.environ.get("ENABLE_MEDASR", "true").lower() == "true"
+# --- Prompt Repetition (arXiv:2512.14982) ---
+# Repeating the user prompt improves non-reasoning LLM performance (47 wins, 0 losses
+# across 70 benchmark-model combos). Only increases prefill tokens, no extra generation.
+ENABLE_PROMPT_REPETITION = os.environ.get("ENABLE_PROMPT_REPETITION", "true").lower() == "true"
+# --- HF Token (for gated models) ---
+# Loaded from: env var > huggingface-cli login stored token (auto)
+HF_TOKEN = os.environ.get("HF_TOKEN", None)
+# --- Model IDs (HF Hub) ---
+_MEDGEMMA_4B_HUB_ID = "google/medgemma-1.5-4b-it"
+_MEDGEMMA_27B_HUB_ID = "google/medgemma-27b-text-it"
+_MEDSIGLIP_HUB_ID = "google/medsiglip-448"
+_MEDASR_HUB_ID = "google/medasr"
+# --- Optional local model directories (override HF Hub) ---
+# Set these env vars to point to a local directory containing model weights.
+# If not set, models load from HF cache (downloaded via `huggingface-cli download`).
+MODEL_LOCAL_DIR = os.environ.get("MODEL_LOCAL_DIR", None)
+def _resolve_model_path(hub_id: str, local_subdir: str | None = None) -> str:
+    """Resolve model path: local dir > HF cache > HF Hub ID."""
+    # 1. Explicit local directory
+    if MODEL_LOCAL_DIR:
+        local_path = os.path.join(MODEL_LOCAL_DIR, local_subdir or hub_id.split("/")[-1])
+        if os.path.isdir(local_path):
+            return local_path
+    # 2. HF cache (already downloaded via huggingface-cli download)
+    try:
+        cached = try_to_load_from_cache(hub_id, "config.json")
+    except Exception:
+        cached = None
+    if cached is not None and isinstance(cached, str):
+        # Return the repo snapshot directory (parent of config.json)
+        return os.path.dirname(cached)
+    # 3. Fallback to Hub ID (will download on first use)
+    return hub_id
+MEDGEMMA_4B_MODEL_ID = _resolve_model_path(_MEDGEMMA_4B_HUB_ID, "medgemma-4b")
+MEDGEMMA_27B_MODEL_ID = _resolve_model_path(_MEDGEMMA_27B_HUB_ID, "medgemma-27b")
+MEDSIGLIP_MODEL_ID = _resolve_model_path(_MEDSIGLIP_HUB_ID, "medsiglip-448")
+MEDASR_MODEL_ID = _resolve_model_path(_MEDASR_HUB_ID, "medasr")
+# --- Generation Parameters ---
+MAX_NEW_TOKENS_4B = 4096
+MAX_NEW_TOKENS_27B = 6000
+TEMPERATURE = 0.0
+REPETITION_PENALTY = 1.2  # Prevent greedy decoding repetition loops
+# --- Device ---
+DEVICE = os.environ.get("DEVICE", "cuda")
+# --- Demo Cases Directory ---
+DATA_DIR = os.path.join(os.path.dirname(__file__), "data")
+DEMO_CASES_DIR = os.path.join(DATA_DIR, "demo_cases")

data/demo_cases/SOURCES.md ADDED Viewed

	@@ -0,0 +1,105 @@

+# Demo Case Clinical Data Sources
+The clinical scenarios in this project are **composite cases** constructed from published medical literature on diagnostic errors. They are not direct copies of any single patient case. Each scenario synthesizes common presentation patterns, vitals, labs, and misdiagnosis trajectories documented across multiple peer-reviewed sources.
+---
+## Case 1: Missed Pneumothorax (32M, Motorcycle Collision)
+**Misdiagnosis pattern**: Traumatic pneumothorax missed on supine AP chest X-ray, discharged as rib contusion.
+### Key References
+- **Ball CG, Kirkpatrick AW, Laupland KB, et al.** "Incidence, risk factors, and outcomes for occult pneumothoraces in victims of major trauma." *J Trauma.* 2005;59(4):917-924.
+  - Documents occult pneumothorax rates of 29-72% in trauma patients; supine CXR misses a significant proportion.
+- **Soldati G, Testa A, Sher S, et al.** "Occult traumatic pneumothorax: diagnostic accuracy of lung ultrasonography in the emergency department." *Chest.* 2008;133(1):204-211.
+- **Omar HR, Abdelmalak H, Mangar D, Rashad R.** "Occult pneumothorax, revisited." *J Trauma Manag Outcomes.* 2010;4:12.
+  - PMC2984474 — Reviews occult pneumothorax prevalence (3.7% to 64%), risk factors (subcutaneous emphysema OR 5.47, rib fractures OR 2.65).
+- **Defined A, et al.** "Anteroposterior chest radiograph vs. chest CT scan in early detection of pneumothorax in trauma patients." *J Cardiothorac Surg.* 2011;6:74.
+  - PMC3195099 — Case series including 42M and 24M MVA patients with CXR-negative, CT-positive pneumothorax.
+- **Del Cura JL, et al.** "Commonly Missed Findings on Chest Radiographs: Causes and Consequences." *Chest.* 2023;163(3):650-661.
+  - PMC10154905 — Systematic review of perceptual errors in CXR interpretation.
+### Clinical Data Basis
+- Vitals (HR 104, SpO2 96%, BP 132/84) reflect typical blunt chest trauma presentation from trauma registry data.
+- Labs (WBC 11.2, Lactate 1.8) are within ranges reported for minor trauma without hemorrhagic shock.
+- Supine AP film reading pattern based on documented false-negative scenarios in the cited studies.
+---
+## Case 2: Aortic Dissection Misdiagnosed as GERD (58M, Hypertensive)
+**Misdiagnosis pattern**: Acute aortic dissection attributed to acid reflux/esophageal spasm, sent home with antacids.
+### Key References
+- **Defined A, et al.** "Acute aortic dissection: a missed diagnosis." *BMJ Case Rep.* 2018;2018:bcr2018226586.
+  - PMC6203039 — 60M with untreated hypertension, sudden chest pain radiating to back, initially misdiagnosed as indigestion. CT angiography revealed Stanford type B dissection.
+- **Hansen MS, Nogareda GJ, Hutchison SJ.** "Frequency of and inappropriate treatment of misdiagnosis of acute aortic dissection." *Am J Cardiol.* 2007;99(6):852-856.
+  - Overall misdiagnosis rate of 33.8% for aortic dissection.
+- **Defined A, et al.** "Misdiagnosis of aortic dissection: experience of 361 patients." *J Clin Hypertens.* 2012;14(4):256-260.
+  - PubMed 22458748 — Large series documenting misdiagnosis factors including GI-like symptoms.
+- **Defined A, et al.** "Acute aortic dissection: be aware of misdiagnosis." *BMC Res Notes.* 2009;2:25.
+  - Vitals: BP 210/135, HR 126, RR 40, SpO2 95% on O2.
+- **MLMIC Insurance Company.** "Case Study: Failure to Diagnose Dissection of Ascending Thoracic Aorta Results in Settlement."
+  - Real malpractice case: patient prescribed Prilosec for presumed GERD, died same evening from undiagnosed ascending aortic dissection with cardiac tamponade.
+- **CBS News / Mayo Clinic.** "He thought he had severe acid reflux. Doctors found a much different problem."
+  - Patient with prolonged GERD misdiagnosis, eventually found to have 7cm aortic aneurysm with bicuspid aortic valve.
+### Clinical Data Basis
+- Blood pressure asymmetry (178/102 R arm vs 146/88 L arm) is a classic dissection sign documented in IRAD registry data.
+- D-dimer 4,850 ng/mL reflects typical elevation in acute dissection (sensitivity >95% per meta-analyses).
+- Serial negative troponins ruling out ACS before GERD attribution matches the documented diagnostic pathway in the cited cases.
+---
+## Case 3: Postpartum Pulmonary Embolism Misdiagnosed as Anxiety (29F, Post C-section)
+**Misdiagnosis pattern**: Postpartum PE symptoms attributed to anxiety/hyperventilation, psychiatric consult ordered instead of CTPA.
+### Key References
+- **Defined A, et al.** "Pulmonary embolism masked by symptoms of mental disorders." *Psychiatr Pol.* 2023;57(5):1121-1136.
+  - PMC10683049 — 21F postpartum patient on duloxetine, repeated "panic attacks" with tachycardia (123 bpm) and hyperventilation (RR 20-24), symptoms attributed to anxiety. Died from PE. Autopsy confirmed pulmonary embolism as cause of death.
+- **Defined A, et al.** "Pulmonary Embolism in the Setting of Panic Attacks." In: *Pulmonary Embolism.* Springer, 2017.
+  - Discusses overlap between PE symptoms (dyspnea, tachycardia, chest pain) and panic attacks; concept of "diagnostic overshadowing."
+- **Defined A.** "My Symptoms Were Misdiagnosed as Anxiety: Tamara's Story." *StopTheClot.org / National Blood Clot Alliance.*
+  - Patient narrative of PE misdiagnosed as anxiety.
+- **Defined A.** "'Organic Anxiety' in a Middle-aged Man Presenting with Dyspnoea: a Case Report." *East Asian Arch Psychiatry.* 2019;29(3):97.
+  - PE presenting as anxiety disorder, eventually diagnosed after high index of suspicion.
+- **Royal College of Obstetricians and Gynaecologists.** "Thromboembolic Disease in Pregnancy and the Puerperium: Acute Management." Green-top Guideline No. 37b.
+  - Half of pregnancy-related VTE occurs postpartum; PE is a leading cause of maternal death.
+- **Defined A, et al.** "Postpartum Pulmonary Embolism in a Grand Multiparous: A Case Report." *Cureus.* 2023;15(6):e40777.
+  - PMC10291952 — Broad differential including anxiety and PE in postpartum dyspnea.
+### Clinical Data Basis
+- Vitals (HR 118, SpO2 91%, RR 28) reflect typical submassive PE presentation from PIOPED II data.
+- ABG (pH 7.48, pO2 68, pCO2 29) shows respiratory alkalosis with hypoxemia, classic PE pattern.
+- D-dimer 3,200 ng/mL is elevated but often dismissed postpartum due to physiologically raised baseline.
+- Right calf tenderness as DVT source matches the documented PE-DVT association (>90% of PE from lower extremity DVT).
+---
+## Medical Images
+The chest X-ray images used in the demo cases are sourced from the **University of Saskatchewan Teaching Collection** (CC-BY-NC-SA 4.0 license) and are representative radiographs, not from the specific patients described in the composite clinical scenarios above.
+---
+## Disclaimer
+These demo cases are **educational composites** designed to illustrate common diagnostic error patterns. They do not represent any individual patient. This tool is a research prototype for the MedGemma Impact Challenge and is **not intended for clinical decision-making**.

data/demo_cases/case1_pneumothorax.png ADDED Viewed

data/demo_cases/case2_aortic_dissection.png ADDED Viewed

data/demo_cases/case3_pulmonary_embolism.png ADDED Viewed

models/__init__.py ADDED Viewed

File without changes

models/medasr_client.py ADDED Viewed

	@@ -0,0 +1,111 @@

+"""
+MedASR client: medical speech-to-text transcription.
+Uses CTC decoding with proper blank-token collapse.
+"""
+from __future__ import annotations
+import logging
+import os
+import threading
+import warnings
+from config import MEDASR_MODEL_ID, HF_TOKEN, DEVICE, ENABLE_MEDASR
+logger = logging.getLogger(__name__)
+_model = None
+_processor = None
+_load_lock = threading.Lock()
+def _token_arg() -> dict:
+    if os.path.isdir(MEDASR_MODEL_ID):
+        return {}
+    return {"token": HF_TOKEN}
+def load():
+    """Load MedASR model and processor."""
+    global _model, _processor
+    if _model is not None:
+        return _model, _processor
+    if not ENABLE_MEDASR:
+        raise RuntimeError("MedASR is disabled via ENABLE_MEDASR=false")
+    with _load_lock:
+        if _model is not None:
+            return _model, _processor
+        import torch
+        from transformers import AutoModelForCTC, AutoProcessor
+        logger.info("Loading MedASR from %s...", "local" if os.path.isdir(MEDASR_MODEL_ID) else "HF Hub")
+        _processor = AutoProcessor.from_pretrained(MEDASR_MODEL_ID, **_token_arg())
+        _model = AutoModelForCTC.from_pretrained(
+            MEDASR_MODEL_ID, **_token_arg(), dtype=torch.float32,
+        ).to(DEVICE)
+        _model.eval()
+        logger.info("MedASR loaded.")
+        return _model, _processor
+def _ctc_greedy_decode(logits, processor) -> str:
+    """
+    Proper CTC greedy decode:
+    1. argmax to get predicted token IDs
+    2. Collapse consecutive duplicate IDs
+    3. Remove blank token IDs
+    4. Decode remaining IDs to text
+    """
+    import torch
+    predicted_ids = torch.argmax(logits, dim=-1)[0]  # (seq_len,)
+    # Determine blank token ID
+    blank_id = getattr(processor.tokenizer, "pad_token_id", None)
+    if blank_id is None:
+        blank_id = 0  # CTC blank is typically ID 0
+    # Collapse consecutive duplicates, then remove blanks
+    collapsed = []
+    prev_id = -1
+    for token_id in predicted_ids.tolist():
+        if token_id != prev_id:
+            if token_id != blank_id:
+                collapsed.append(token_id)
+            prev_id = token_id
+    if not collapsed:
+        return ""
+    # Decode token IDs to text
+    text = processor.tokenizer.decode(collapsed, skip_special_tokens=True)
+    return text.strip()
+def transcribe(audio_array, sampling_rate: int = 16000) -> str:
+    """
+    Transcribe audio to text using CTC greedy decoding.
+    Args:
+        audio_array: numpy array of audio samples (mono, float32).
+        sampling_rate: audio sample rate (MedASR expects 16kHz).
+    Returns:
+        Transcribed text string.
+    """
+    model, processor = load()
+    import torch
+    inputs = processor(
+        audio_array, sampling_rate=sampling_rate, return_tensors="pt",
+    ).to(model.device)
+    with torch.inference_mode():
+        # Suppress the harmless padding='same' convolution warning
+        with warnings.catch_warnings():
+            warnings.filterwarnings("ignore", message=".*padding='same'.*")
+            logits = model(**inputs).logits
+    return _ctc_greedy_decode(logits, processor)

models/medgemma_client.py ADDED Viewed

	@@ -0,0 +1,222 @@

+"""
+MedGemma client: unified interface for 4B (multimodal) and 27B (text-only) models.
+Loads locally via transformers with optional 4-bit quantization.
+"""
+from __future__ import annotations
+import logging
+import os
+import threading
+from PIL import Image
+from config import (
+    USE_27B, QUANTIZE_4B, HF_TOKEN, DEVICE,
+    MEDGEMMA_4B_MODEL_ID, MEDGEMMA_27B_MODEL_ID,
+    MAX_NEW_TOKENS_4B, MAX_NEW_TOKENS_27B, TEMPERATURE, REPETITION_PENALTY,
+)
+from models.utils import strip_thinking_tokens, resize_for_medgemma, apply_prompt_repetition
+logger = logging.getLogger(__name__)
+_model_4b = None
+_processor_4b = None
+_model_27b = None
+_tokenizer_27b = None
+_load_4b_lock = threading.Lock()
+_load_27b_lock = threading.Lock()
+def _is_local_path(model_id: str) -> bool:
+    """Check if model_id is a local directory path."""
+    return os.path.isdir(model_id)
+def _token_arg(model_id: str) -> dict:
+    """Return token kwarg only when loading from HF Hub (not local path)."""
+    if _is_local_path(model_id):
+        return {}
+    return {"token": HF_TOKEN}
+def _get_quantization_config():
+    """Return BitsAndBytesConfig for 4-bit quantization."""
+    import torch
+    from transformers import BitsAndBytesConfig
+    return BitsAndBytesConfig(
+        load_in_4bit=True,
+        bnb_4bit_compute_dtype=torch.bfloat16,
+        bnb_4bit_quant_type="nf4",
+    )
+def load_4b():
+    """Load MedGemma 4B-IT (multimodal) model and processor."""
+    global _model_4b, _processor_4b
+    if _model_4b is not None:
+        return _model_4b, _processor_4b
+    with _load_4b_lock:
+        if _model_4b is not None:
+            return _model_4b, _processor_4b
+        import torch
+        from transformers import AutoModelForImageTextToText, AutoProcessor
+        is_local = _is_local_path(MEDGEMMA_4B_MODEL_ID)
+        logger.info(
+            "Loading MedGemma 4B-IT (%s) from %s...",
+            "4-bit" if QUANTIZE_4B else "bf16",
+            "local" if is_local else "HF Hub",
+        )
+        # BitsAndBytes quantization requires device_map="auto", not "cuda"
+        device_map = "auto" if QUANTIZE_4B else DEVICE
+        kwargs = {**_token_arg(MEDGEMMA_4B_MODEL_ID), "device_map": device_map}
+        if QUANTIZE_4B:
+            kwargs["quantization_config"] = _get_quantization_config()
+        else:
+            kwargs["dtype"] = torch.bfloat16
+        _processor_4b = AutoProcessor.from_pretrained(MEDGEMMA_4B_MODEL_ID, **_token_arg(MEDGEMMA_4B_MODEL_ID))
+        _model_4b = AutoModelForImageTextToText.from_pretrained(MEDGEMMA_4B_MODEL_ID, **kwargs)
+        _model_4b.eval()
+        logger.info("MedGemma 4B loaded.")
+        return _model_4b, _processor_4b
+def load_27b():
+    """Load MedGemma 27B Text-IT model and tokenizer (A100 only)."""
+    global _model_27b, _tokenizer_27b
+    if _model_27b is not None:
+        return _model_27b, _tokenizer_27b
+    with _load_27b_lock:
+        if _model_27b is not None:
+            return _model_27b, _tokenizer_27b
+        import torch
+        from transformers import AutoModelForCausalLM, AutoTokenizer
+        is_local = _is_local_path(MEDGEMMA_27B_MODEL_ID)
+        logger.info(
+            "Loading MedGemma 27B Text-IT (bf16) from %s...",
+            "local" if is_local else "HF Hub",
+        )
+        _tokenizer_27b = AutoTokenizer.from_pretrained(MEDGEMMA_27B_MODEL_ID, **_token_arg(MEDGEMMA_27B_MODEL_ID))
+        _model_27b = AutoModelForCausalLM.from_pretrained(
+            MEDGEMMA_27B_MODEL_ID,
+            **_token_arg(MEDGEMMA_27B_MODEL_ID),
+            dtype=torch.bfloat16,
+            device_map="auto",
+        )
+        _model_27b.eval()
+        logger.info("MedGemma 27B loaded.")
+        return _model_27b, _tokenizer_27b
+def generate_with_image(prompt: str, image: Image.Image, system_prompt: str = "") -> str:
+    """Generate text from image + text prompt using MedGemma 4B."""
+    model, processor = load_4b()
+    image = resize_for_medgemma(image)
+    prompt = apply_prompt_repetition(prompt)
+    messages = []
+    if system_prompt:
+        messages.append({"role": "system", "content": [{"type": "text", "text": system_prompt}]})
+    messages.append({
+        "role": "user",
+        "content": [
+            {"type": "image", "image": image},
+            {"type": "text", "text": prompt},
+        ],
+    })
+    inputs = processor.apply_chat_template(
+        messages, add_generation_prompt=True, tokenize=True,
+        return_dict=True, return_tensors="pt",
+    ).to(model.device)
+    import torch
+    with torch.inference_mode():
+        output_ids = model.generate(
+            **inputs,
+            max_new_tokens=MAX_NEW_TOKENS_4B,
+            do_sample=TEMPERATURE > 0,
+            repetition_penalty=REPETITION_PENALTY,
+            **({"temperature": TEMPERATURE} if TEMPERATURE > 0 else {}),
+        )
+    # Decode only the new tokens
+    new_tokens = output_ids[0, inputs["input_ids"].shape[1]:]
+    text = processor.tokenizer.decode(new_tokens, skip_special_tokens=True)
+    return strip_thinking_tokens(text)
+def generate_text(prompt: str, system_prompt: str = "") -> str:
+    """Generate text from text-only prompt. Uses 27B if available, else 4B."""
+    if USE_27B:
+        return _generate_text_27b(prompt, system_prompt)
+    return _generate_text_4b(prompt, system_prompt)
+def _generate_text_4b(prompt: str, system_prompt: str = "") -> str:
+    """Text-only generation with 4B model."""
+    model, processor = load_4b()
+    prompt = apply_prompt_repetition(prompt)
+    messages = []
+    if system_prompt:
+        messages.append({"role": "system", "content": [{"type": "text", "text": system_prompt}]})
+    messages.append({"role": "user", "content": [{"type": "text", "text": prompt}]})
+    inputs = processor.apply_chat_template(
+        messages, add_generation_prompt=True, tokenize=True,
+        return_dict=True, return_tensors="pt",
+    ).to(model.device)
+    import torch
+    with torch.inference_mode():
+        output_ids = model.generate(
+            **inputs,
+            max_new_tokens=MAX_NEW_TOKENS_4B,
+            do_sample=TEMPERATURE > 0,
+            repetition_penalty=REPETITION_PENALTY,
+            **({"temperature": TEMPERATURE} if TEMPERATURE > 0 else {}),
+        )
+    new_tokens = output_ids[0, inputs["input_ids"].shape[1]:]
+    text = processor.tokenizer.decode(new_tokens, skip_special_tokens=True)
+    return strip_thinking_tokens(text)
+def _generate_text_27b(prompt: str, system_prompt: str = "") -> str:
+    """Text-only generation with 27B model (thinking mode)."""
+    model, tokenizer = load_27b()
+    prompt = apply_prompt_repetition(prompt)
+    messages = []
+    if system_prompt:
+        messages.append({"role": "system", "content": system_prompt})
+    messages.append({"role": "user", "content": prompt})
+    input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
+    inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
+    import torch
+    with torch.inference_mode():
+        output_ids = model.generate(
+            **inputs,
+            max_new_tokens=MAX_NEW_TOKENS_27B,
+            do_sample=TEMPERATURE > 0,
+            repetition_penalty=REPETITION_PENALTY,
+            **({"temperature": TEMPERATURE} if TEMPERATURE > 0 else {}),
+        )
+    new_tokens = output_ids[0, inputs["input_ids"].shape[1]:]
+    text = tokenizer.decode(new_tokens, skip_special_tokens=True)
+    return strip_thinking_tokens(text)

models/medsiglip_client.py ADDED Viewed

	@@ -0,0 +1,164 @@

+"""
+MedSigLIP client: zero-shot medical image classification and embedding extraction.
+Uses AutoProcessor following the official Google-Health/medsiglip notebook.
+"""
+from __future__ import annotations
+import logging
+import os
+import threading
+from PIL import Image
+from config import MEDSIGLIP_MODEL_ID, HF_TOKEN, DEVICE
+logger = logging.getLogger(__name__)
+_model = None
+_processor = None
+_load_lock = threading.Lock()
+def _token_arg() -> dict:
+    if os.path.isdir(MEDSIGLIP_MODEL_ID):
+        return {}
+    return {"token": HF_TOKEN}
+def load():
+    """Load MedSigLIP model and processor."""
+    global _model, _processor
+    if _model is not None:
+        return _model, _processor
+    with _load_lock:
+        if _model is not None:
+            return _model, _processor
+        import torch
+        from transformers import AutoModel, AutoProcessor
+        logger.info("Loading MedSigLIP from %s...", "local" if os.path.isdir(MEDSIGLIP_MODEL_ID) else "HF Hub")
+        _processor = AutoProcessor.from_pretrained(MEDSIGLIP_MODEL_ID, **_token_arg())
+        _model = AutoModel.from_pretrained(
+            MEDSIGLIP_MODEL_ID, **_token_arg(), dtype=torch.float32,
+        ).to(DEVICE)
+        _model.eval()
+        logger.info("MedSigLIP loaded.")
+        return _model, _processor
+def classify(image: Image.Image, candidate_labels: list) -> list[dict]:
+    """
+    Zero-shot classification of a medical image.
+    Args:
+        candidate_labels: list of str OR list of (short_label, descriptive_prompt) tuples.
+    Returns list of {"label": str, "score": float} sorted by descending score.
+    Scores are raw logits (not sigmoid/softmax) — higher = better match.
+    """
+    if candidate_labels and isinstance(candidate_labels[0], (list, tuple)):
+        display_labels = [c[0] for c in candidate_labels]
+        text_prompts = [c[1] for c in candidate_labels]
+    else:
+        display_labels = candidate_labels
+        text_prompts = candidate_labels
+    model, processor = load()
+    # Official usage: single processor call with padding="max_length"
+    inputs = processor(
+        text=text_prompts, images=image,
+        padding="max_length", return_tensors="pt",
+    ).to(model.device)
+    import torch
+    with torch.inference_mode():
+        outputs = model(**inputs)
+    # Use raw logits — official notebook uses argmax on logits_per_image directly
+    logits = outputs.logits_per_image[0].cpu().tolist()
+    results = [{"label": label, "score": score} for label, score in zip(display_labels, logits)]
+    results.sort(key=lambda x: x["score"], reverse=True)
+    return results
+def _normalize_modality(modality: str | None) -> str:
+    m = (modality or "").strip().lower()
+    if m in {"cxr", "x-ray", "xray", "chest x-ray", "chest xray", "chest radiograph", "radiograph"}:
+        return "cxr"
+    if m in {"ct", "ct scan", "computed tomography"}:
+        return "ct"
+    return "other"
+def _verification_prompts(sign: str, modality: str | None) -> tuple[str, str]:
+    sign_l = sign.lower()
+    m = _normalize_modality(modality)
+    if m == "ct":
+        positive = f"a CT scan showing {sign_l}"
+        negative = f"a CT scan showing no evidence of {sign_l}"
+    elif m == "other":
+        positive = f"a medical image showing {sign_l}"
+        negative = f"a medical image showing no evidence of {sign_l}"
+    else:
+        positive = f"a chest radiograph showing {sign_l}"
+        negative = f"a normal chest radiograph with no {sign_l}"
+    return positive, negative
+def verify_sign(image: Image.Image, sign: str, modality: str | None = None) -> dict:
+    """
+    Binary verification: does the image show this finding/sign?
+    Compares "showing X" vs "no X" — matches official MedSigLIP usage pattern.
+    Returns confidence level based on logit difference:
+        diff > 2  → "likely present"
+        diff > 0  → "possibly present"
+        diff > -2 → "inconclusive"
+        else      → "likely absent"
+    """
+    positive, negative = _verification_prompts(sign, modality)
+    results = classify(image, [
+        ("positive", positive),
+        ("negative", negative),
+    ])
+    pos = next(r for r in results if r["label"] == "positive")
+    neg = next(r for r in results if r["label"] == "negative")
+    diff = pos["score"] - neg["score"]
+    if diff > 2:
+        confidence = "likely present"
+    elif diff > 0:
+        confidence = "possibly present"
+    elif diff > -2:
+        confidence = "inconclusive"
+    else:
+        confidence = "likely absent"
+    return {
+        "sign": sign,
+        "modality": _normalize_modality(modality),
+        "positive_logit": pos["score"],
+        "negative_logit": neg["score"],
+        "diff": diff,
+        "confidence": confidence,
+    }
+def verify_findings(
+    image: Image.Image,
+    signs: list[str],
+    modality: str | None = None,
+) -> list[dict]:
+    """
+    Verify a list of imaging signs against the image.
+    Returns only results where SigLIP has a meaningful opinion (not inconclusive).
+    """
+    results = [verify_sign(image, sign, modality=modality) for sign in signs]
+    return results

models/utils.py ADDED Viewed

	@@ -0,0 +1,49 @@

+"""
+Utility functions for model outputs: thinking token stripping, image encoding,
+prompt repetition, etc.
+"""
+import re
+import base64
+from io import BytesIO
+from PIL import Image
+from config import ENABLE_PROMPT_REPETITION
+# MedGemma wraps internal reasoning in <unused94>...<unused95> tags
+THINKING_PATTERN = re.compile(r"<unused94>.*?<unused95>", re.DOTALL)
+def strip_thinking_tokens(text: str) -> str:
+    """Remove MedGemma's internal thinking tokens from output."""
+    return THINKING_PATTERN.sub("", text).strip()
+def image_to_base64(image: Image.Image, fmt: str = "PNG") -> str:
+    """Convert PIL Image to base64 data URL string."""
+    buffer = BytesIO()
+    image.save(buffer, format=fmt)
+    encoded = base64.b64encode(buffer.getvalue()).decode("utf-8")
+    return f"data:image/{fmt.lower()};base64,{encoded}"
+def apply_prompt_repetition(prompt: str) -> str:
+    """Repeat the user prompt to improve LLM output quality.
+    Based on "Prompt Repetition Improves Non-Reasoning LLMs"
+    (arXiv:2512.14982, Google Research 2025): repeating the input prompt
+    wins 47/70 benchmark-model combos with 0 losses. Uses the verbose
+    variant with a transition phrase for clarity.
+    """
+    if not ENABLE_PROMPT_REPETITION:
+        return prompt
+    return f"{prompt}\n\nLet me repeat the request:\n\n{prompt}"
+def resize_for_medgemma(image: Image.Image, max_size: int = 896) -> Image.Image:
+    """Resize image to fit MedGemma's expected input resolution (896x896)."""
+    if max(image.size) <= max_size:
+        return image
+    ratio = max_size / max(image.size)
+    new_size = (int(image.size[0] * ratio), int(image.size[1] * ratio))
+    return image.resize(new_size, Image.LANCZOS)

requirements.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+torch>=2.1.0
+transformers>=4.50.0
+accelerate>=0.26.0
+bitsandbytes>=0.42.0
+langgraph>=0.2.0
+gradio==5.12.0
+Pillow>=10.0.0
+numpy>=1.24.0
+scipy>=1.10.0
+json-repair>=0.30.0

tests/__init__.py ADDED Viewed

File without changes

tests/test_output_parser.py ADDED Viewed

	@@ -0,0 +1,24 @@

+import sys
+import os
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
+from agents.output_parser import parse_json_response
+def test_parse_json_response_returns_dict():
+    parsed = parse_json_response('{"challenges":[{"claim":"x","counter_evidence":"y"}]}')
+    assert parsed["challenges"][0]["claim"] == "x"
+def test_parse_json_response_coerces_top_level_list_of_strings():
+    parsed = parse_json_response('["CT angiogram","D-dimer"]')
+    assert parsed["items"] == ["CT angiogram", "D-dimer"]
+def test_parse_json_response_infers_container_key_for_da_items():
+    parsed = parse_json_response(
+        '[{"diagnosis":"Aortic dissection","why_dangerous":"High mortality","supporting_signs":"Pain radiating to back","rule_out_test":"CTA chest"}]'
+    )
+    assert parsed["must_not_miss"][0]["diagnosis"] == "Aortic dissection"

tests/test_pipeline_mock.py ADDED Viewed

	@@ -0,0 +1,124 @@

+"""End-to-end pipeline test with mocked model calls (no GPU required)."""
+import os
+import sys
+from unittest.mock import patch
+from PIL import Image
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
+def test_pipeline_end_to_end_with_mocks():
+    from agents.graph import run_pipeline
+    from agents.prompts import (
+        DIAGNOSTICIAN_SYSTEM,
+        BIAS_DETECTOR_SYSTEM,
+        DEVIL_ADVOCATE_SYSTEM,
+        CONSULTANT_SYSTEM,
+    )
+    dummy_image = Image.new("RGB", (512, 512), color="gray")
+    diagnostician_json = """
+    {
+      "findings": [
+        {"finding": "Pneumothorax", "source": "imaging", "description": "Left apical pleural line with absent peripheral markings."},
+        {"finding": "Rib fracture", "source": "imaging", "description": "Possible fracture of the left 5th rib."},
+        {"finding": "Tachycardia", "source": "clinical", "description": "HR 104, consistent with pain or hemodynamic compromise."}
+      ],
+      "differential_diagnoses": [
+        {"diagnosis": "Pneumothorax", "reasoning": "Visible pleural line on imaging combined with tachycardia and dyspnea from clinical context."}
+      ]
+    }
+    """.strip()
+    bias_detector_json = """
+    {
+      "discrepancy_summary": "Doctor focused on rib pain; image suggests pneumothorax.",
+      "identified_biases": [
+        {"type": "Anchoring", "evidence": "Trauma mechanism overweighted", "severity": "HIGH"}
+      ],
+      "missed_findings": ["Pneumothorax"],
+      "agreement_points": ["Rib pain consistent with trauma"]
+    }
+    """.strip()
+    devil_advocate_json = """
+    {
+      "challenges": [
+        {"claim": "Rib contusion explains symptoms", "counter_evidence": "Dyspnea can reflect pneumothorax severity."}
+      ],
+      "must_not_miss": [
+        {
+          "diagnosis": "Tension pneumothorax",
+          "why_dangerous": "Rapid hemodynamic collapse if untreated",
+          "supporting_signs": "Worsening dyspnea and pleural line",
+          "rule_out_test": "Bedside ultrasound or repeat upright CXR"
+        }
+      ],
+      "recommended_workup": ["Repeat upright chest radiograph", "Point-of-care ultrasound"]
+    }
+    """.strip()
+    consultant_json = """
+    {
+      "consultation_note": "Have you considered pneumothorax given the pleural line?\\n\\nI would re-image upright and consider bedside ultrasound.",
+      "alternative_diagnoses": [
+        {
+          "diagnosis": "Pneumothorax",
+          "urgency": "high",
+          "evidence": "Pleural line and absent peripheral markings",
+          "next_step": "Repeat upright CXR or POCUS"
+        }
+      ],
+      "immediate_actions": ["Repeat upright CXR", "POCUS"],
+      "confidence_note": "Based on a single image; clinical correlation required."
+    }
+    """.strip()
+    def fake_generate_with_image(_prompt: str, _image, system_prompt: str = "") -> str:
+        if system_prompt == DIAGNOSTICIAN_SYSTEM:
+            return diagnostician_json
+        if system_prompt == BIAS_DETECTOR_SYSTEM:
+            return bias_detector_json
+        if system_prompt.startswith(DEVIL_ADVOCATE_SYSTEM):
+            return devil_advocate_json
+        raise AssertionError(f"Unexpected system_prompt (with image): {system_prompt!r}")
+    def fake_generate_text(_prompt: str, system_prompt: str = "") -> str:
+        if system_prompt == CONSULTANT_SYSTEM:
+            return consultant_json
+        raise AssertionError(f"Unexpected system_prompt: {system_prompt!r}")
+    with patch("models.medgemma_client.generate_with_image", side_effect=fake_generate_with_image), patch(
+        "models.medgemma_client.generate_text",
+        side_effect=fake_generate_text,
+    ), patch(
+        "models.medsiglip_client.verify_findings",
+        return_value=[{"sign": "pneumothorax", "confidence": "likely present"}],
+    ):
+        result = run_pipeline(
+            image=dummy_image,
+            doctor_diagnosis="Rib contusion",
+            clinical_context="32M, trauma, left chest pain, mild dyspnea.",
+            modality="CXR",
+        )
+    assert result.get("error") is None
+    diag = result.get("diagnostician_output") or {}
+    assert diag.get("findings_list"), "Diagnostician findings_list missing"
+    assert diag.get("analysis"), "Diagnostician analysis missing"
+    bias = result.get("bias_detector_output") or {}
+    assert bias.get("discrepancy_summary")
+    assert bias.get("identified_biases"), "Bias detector identified_biases missing"
+    da = result.get("devils_advocate_output") or {}
+    assert da.get("must_not_miss"), "Devil's advocate must_not_miss missing"
+    assert all(isinstance(x, str) for x in da.get("recommended_workup", []))
+    ref = result.get("consultant_output") or {}
+    assert ref.get("consultation_note")
+    assert isinstance(ref.get("alternative_diagnoses"), list)

tests/test_smoke.py ADDED Viewed

	@@ -0,0 +1,35 @@

+"""Quick smoke tests: imports, graph build, demo loading, utils."""
+import os
+import sys
+from PIL import Image
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
+def test_smoke_graph_builds():
+    from agents.graph import build_graph
+    graph = build_graph()
+    assert graph is not None
+def test_smoke_demo_loader_returns_expected_shape():
+    from ui.callbacks import load_demo
+    img, diagnosis, context, modality = load_demo("Case 1: Missed Pneumothorax")
+    assert isinstance(diagnosis, str) and diagnosis
+    assert isinstance(context, str) and context
+    assert modality in {"CXR", "CT", "Other"}
+    assert img is None or hasattr(img, "size")
+def test_utils_strip_and_resize():
+    from models.utils import resize_for_medgemma, strip_thinking_tokens
+    assert strip_thinking_tokens("<unused94>t<unused95>Real answer") == "Real answer"
+    big = Image.new("RGB", (2000, 2000), color="gray")
+    resized = resize_for_medgemma(big)
+    assert max(resized.size) <= 896

ui/__init__.py ADDED Viewed

File without changes

ui/callbacks.py ADDED Viewed

	@@ -0,0 +1,459 @@

+"""Callbacks connecting UI events to the LangGraph pipeline."""
+import json
+import logging
+import os
+import numpy as np
+from PIL import Image
+from agents.graph import stream_pipeline
+from config import DEMO_CASES_DIR, ENABLE_MEDASR
+logger = logging.getLogger(__name__)
+# Agent display info
+AGENT_INFO = {
+    "diagnostician": ("Diagnostician", "Analyzing image independently..."),
+    "bias_detector": ("Bias Detector", "Scanning for cognitive biases..."),
+    "devil_advocate": ("Devil's Advocate", "Challenging the diagnosis..."),
+    "consultant": ("Consultant", "Synthesizing consultation report..."),
+}
+# Demo case definitions — based on published case reports and clinical literature.
+# References:
+#   Case 1: PMC3195099 (AP CXR vs CT in trauma pneumothorax detection)
+#   Case 2: PMC6203039 (Acute aortic dissection: a missed diagnosis)
+#   Case 3: PMC10683049 (PE masked by symptoms of mental disorders)
+DEMO_CASES = {
+    "Case 1: Missed Pneumothorax": {
+        "diagnosis": "Left rib contusion with musculoskeletal chest wall pain",
+        "context": (
+            "32-year-old male, presented to ED after a motorcycle collision at ~40 mph. "
+            "Helmet worn, no LOC. Chief complaint: left-sided chest pain worse with deep "
+            "inspiration.\n\n"
+            "Vitals: HR 104 bpm, BP 132/84 mmHg, RR 22/min, SpO2 96% on room air, "
+            "Temp 37.1 C.\n\n"
+            "Exam: Tenderness over left 4th-6th ribs, no crepitus, no subcutaneous "
+            "emphysema palpated. Breath sounds reportedly equal bilaterally (noisy ED). "
+            "Mild dyspnea attributed to pain.\n\n"
+            "Labs: WBC 11.2, Hgb 14.1, Lactate 1.8 mmol/L.\n\n"
+            "ED physician ordered AP chest X-ray (supine) — read as 'no acute "
+            "cardiopulmonary abnormality, possible left rib fracture.' Patient was given "
+            "ibuprofen and discharged with rib fracture precautions."
+        ),
+        "image_file": "case1_pneumothorax.png",
+        "modality": "CXR",
+    },
+    "Case 2: Aortic Dissection": {
+        "diagnosis": "Acute gastroesophageal reflux / esophageal spasm",
+        "context": (
+            "58-year-old male with a 15-year history of hypertension (poorly controlled, "
+            "non-compliant with amlodipine). Presented to ED with sudden-onset severe "
+            "retrosternal chest pain radiating to the interscapular back region, starting "
+            "30 minutes ago.\n\n"
+            "Vitals: BP 178/102 mmHg (right arm), 146/88 mmHg (left arm), HR 92 bpm, "
+            "RR 20/min, SpO2 97%, Temp 37.0 C.\n\n"
+            "Exam: Diaphoretic, visibly distressed. Abdomen soft, mild epigastric "
+            "tenderness. Heart sounds normal, no murmur. Peripheral pulses intact but "
+            "radial pulse asymmetry noted.\n\n"
+            "Labs: Troponin I <0.01 (negative x2 at 0h and 3h), D-dimer 4,850 ng/mL "
+            "(markedly elevated), WBC 13.4, Creatinine 1.3.\n\n"
+            "ECG: Sinus tachycardia, nonspecific ST changes. Initial CXR ordered. "
+            "ED physician considered ACS (ruled out by troponin), then attributed symptoms "
+            "to acid reflux; prescribed IV pantoprazole and GI cocktail. Pain not relieved."
+        ),
+        "image_file": "case2_aortic_dissection.png",
+        "modality": "CXR",
+    },
+    "Case 3: Pulmonary Embolism": {
+        "diagnosis": "Postpartum anxiety with hyperventilation syndrome",
+        "context": (
+            "29-year-old female, G2P2, day 5 after emergency cesarean section (prolonged "
+            "labor, general anesthesia). Presented with acute onset dyspnea and chest "
+            "tightness at rest. Reports feeling of 'impending doom' and inability to catch "
+            "breath.\n\n"
+            "Vitals: HR 118 bpm, BP 108/72 mmHg, RR 28/min, SpO2 91% on room air "
+            "(improved to 95% on 4L NC), Temp 37.3 C.\n\n"
+            "Exam: Anxious-appearing, tachypneic. Lungs clear to auscultation. Mild "
+            "right-sided pleuritic chest pain. Right calf tenderness and mild swelling "
+            "noted but attributed to post-surgical immobility. No Homan sign.\n\n"
+            "Labs: D-dimer 3,200 ng/mL (elevated, but 'expected postpartum'), "
+            "WBC 10.8, Hgb 10.2, ABG on RA: pH 7.48, pO2 68 mmHg, pCO2 29 mmHg.\n\n"
+            "OB team attributed symptoms to postpartum anxiety, prescribed lorazepam "
+            "0.5 mg PRN. Psychiatry consult requested. No CTPA ordered initially."
+        ),
+        "image_file": "case3_pulmonary_embolism.png",
+        "modality": "CXR",
+    },
+}
+def analyze_streaming(image: Image.Image | None, diagnosis: str, context: str, modality: str):
+    """
+    Generator: run pipeline and yield single HTML output after each agent step.
+    Each agent's output appears inline below its progress header.
+    """
+    if image is None:
+        yield '<div class="pipeline-error">Please upload a medical image.</div>'
+        return
+    if not diagnosis.strip():
+        yield '<div class="pipeline-error">Please enter the doctor\'s working diagnosis.</div>'
+        return
+    if not context.strip():
+        context = "No additional clinical context provided."
+    if not isinstance(modality, str) or not modality.strip():
+        modality = "CXR"
+    completed = {}
+    agent_outputs = {}
+    all_agents = ["diagnostician", "bias_detector", "devil_advocate", "consultant"]
+    try:
+        yield _build_pipeline(all_agents, completed, agent_outputs, active="diagnostician")
+        accumulated_state = {}
+        for node_name, state_update in stream_pipeline(image, diagnosis.strip(), context.strip(), modality.strip()):
+            completed[node_name] = True
+            accumulated_state.update(state_update)
+            if state_update.get("error"):
+                agent_outputs[node_name] = f'<div class="pipeline-error">{_esc(state_update.get("error"))}</div>'
+                yield _build_pipeline(all_agents, completed, agent_outputs, error=node_name)
+                return
+            # Generate this agent's HTML output
+            agent_outputs[node_name] = _format_agent_output(node_name, accumulated_state)
+            idx = all_agents.index(node_name) if node_name in all_agents else -1
+            next_active = all_agents[idx + 1] if idx + 1 < len(all_agents) else None
+            yield _build_pipeline(all_agents, completed, agent_outputs, active=next_active)
+    except Exception as e:
+        logger.exception("Pipeline failed")
+        yield f'<div class="pipeline-error">Pipeline error: {_esc(e)}</div>'
+def _build_pipeline(all_agents, completed, agent_outputs, active=None, error=None) -> str:
+    """Build combined progress + inline output HTML."""
+    from ui.components import _build_progress_html
+    return _build_progress_html(
+        completed=list(completed.keys()),
+        active=active,
+        error=error,
+        agent_outputs=agent_outputs,
+    )
+def _format_agent_output(agent_id: str, state: dict) -> str:
+    """Generate HTML content for a specific agent's output."""
+    if agent_id == "diagnostician":
+        return _format_diagnostician(state)
+    elif agent_id == "bias_detector":
+        return _format_bias_detector(state)
+    elif agent_id == "devil_advocate":
+        return _format_devil_advocate(state)
+    elif agent_id == "consultant":
+        return _format_consultant(state)
+    return ""
+def _esc(text: object) -> str:
+    """Escape HTML special characters."""
+    return str(text).replace("&", "&amp;").replace("<", "&lt;").replace(">", "&gt;")
+def _format_diagnostician(state: dict) -> str:
+    diag = state.get("diagnostician_output") or {}
+    parts = []
+    # Structured findings
+    findings_list = diag.get("findings_list", [])
+    if findings_list:
+        items = []
+        for f in findings_list:
+            if isinstance(f, dict):
+                name = _esc(f.get("finding", ""))
+                desc = _esc(f.get("description", ""))
+                source = f.get("source", "").strip().lower()
+                source_tag = ""
+                if source in ("imaging", "clinical", "both"):
+                    source_tag = f' <span class="source-tag source-{source}">{_esc(source)}</span>'
+                line = f"<li><strong>{name}</strong>{source_tag}: {desc}</li>" if desc else f"<li>{name}{source_tag}</li>"
+                items.append(line)
+            else:
+                items.append(f"<li>{_esc(str(f))}</li>")
+        parts.append(f'<div class="findings-section"><strong>Findings</strong><ul>{"".join(items)}</ul></div>')
+    # Differential diagnoses
+    differentials = diag.get("differential_diagnoses", [])
+    if differentials:
+        items = []
+        for d in differentials:
+            if isinstance(d, dict):
+                name = _esc(d.get("diagnosis", ""))
+                reason = _esc(d.get("reasoning", ""))
+                items.append(f"<li><strong>{name}</strong>: {reason}</li>" if reason else f"<li>{name}</li>")
+            else:
+                items.append(f"<li>{_esc(str(d))}</li>")
+        parts.append(f'<div class="differentials-section"><strong>Differential Diagnoses</strong><ol>{"".join(items)}</ol></div>')
+    # Fallback: raw text if no structured data
+    if not parts:
+        raw = diag.get("findings", "")
+        if raw:
+            parts.append(f'<div class="agent-text">{_esc(raw).replace(chr(10), "<br>")}</div>')
+    return "".join(parts)
+def _format_bias_detector(state: dict) -> str:
+    bias_out = state.get("bias_detector_output") or {}
+    parts = []
+    # Discrepancy summary (always show if present)
+    disc = bias_out.get("discrepancy_summary", "")
+    if disc:
+        parts.append(f'<div class="discrepancy-summary">{_esc(disc)}</div>')
+    # Biases
+    biases = bias_out.get("identified_biases", [])
+    for b in biases:
+        severity = b.get("severity", "").strip().lower()
+        bias_type = _esc(b.get("type", "Unknown"))
+        evidence = _esc(b.get("evidence", ""))
+        source = b.get("source", "").strip().lower()
+        if severity in ("low", "medium", "high"):
+            sev_tag = f'<span class="severity-tag severity-{severity}">{severity.upper()}</span>'
+        else:
+            sev_tag = ""
+        if source in ("doctor", "ai", "both"):
+            src_tag = f'<span class="source-tag source-{source}">{source.upper()}</span>'
+        else:
+            src_tag = ""
+        parts.append(
+            f'<div class="bias-item">'
+            f'<div class="bias-title">{sev_tag} {src_tag} {bias_type}</div>'
+            f'<div class="bias-evidence">{evidence}</div>'
+            f'</div>'
+        )
+    # Missed findings
+    missed = bias_out.get("missed_findings", [])
+    if missed:
+        items = "".join(f"<li>{_esc(f)}</li>" for f in missed)
+        parts.append(f'<div class="missed-findings"><strong>Missed Findings</strong><ul>{items}</ul></div>')
+    # SigLIP sign verification
+    sign_results = bias_out.get("consistency_check", [])
+    if isinstance(sign_results, list) and sign_results:
+        meaningful = [r for r in sign_results if r.get("confidence") != "inconclusive"]
+        if meaningful:
+            items = []
+            for r in meaningful:
+                conf = r.get("confidence", "?")
+                sign = _esc(r.get("sign", "?"))
+                css_cls = "sign-present" if "present" in conf else "sign-absent"
+                items.append(f'<li class="{css_cls}"><strong>{sign}</strong> — {conf}</li>')
+            parts.append(
+                f'<div class="siglip-section">'
+                f'<strong>Image Verification (MedSigLIP)</strong>'
+                f'<ul>{"".join(items)}</ul>'
+                f'</div>'
+            )
+    return "".join(parts)
+def _format_devil_advocate(state: dict) -> str:
+    da_out = state.get("devils_advocate_output") or {}
+    parts = []
+    # Must-not-miss
+    mnm = da_out.get("must_not_miss", [])
+    for m in mnm:
+        dx = _esc(m.get("diagnosis", "?"))
+        why = _esc(m.get("why_dangerous", ""))
+        signs = _esc(m.get("supporting_signs", ""))
+        test = _esc(m.get("rule_out_test", ""))
+        details = ""
+        if why:
+            details += f"<li><strong>Why dangerous:</strong> {why}</li>"
+        if signs:
+            details += f"<li><strong>Supporting signs:</strong> {signs}</li>"
+        if test:
+            details += f"<li><strong>Rule-out test:</strong> {test}</li>"
+        parts.append(
+            f'<div class="mnm-item">'
+            f'<div class="mnm-title">{dx}</div>'
+            f'<ul>{details}</ul>'
+            f'</div>'
+        )
+    # Challenges
+    challenges = da_out.get("challenges", [])
+    if challenges:
+        for c in challenges:
+            claim = _esc(c.get("claim", ""))
+            counter = _esc(c.get("counter_evidence", ""))
+            parts.append(
+                f'<div class="challenge-item">'
+                f'<div class="challenge-claim">{claim}</div>'
+                f'<div class="challenge-counter">{counter}</div>'
+                f'</div>'
+            )
+    # Recommended workup
+    workup = da_out.get("recommended_workup", [])
+    if workup:
+        items = "".join(f"<li>{_esc(str(w))}</li>" for w in workup)
+        parts.append(f'<div class="workup-section"><strong>Recommended Workup</strong><ul>{items}</ul></div>')
+    # Fallback: ensure non-empty so the collapsible block can expand
+    if not parts:
+        parts.append('<div class="agent-text">No structured challenges parsed.</div>')
+    return "".join(parts)
+def _format_consultant(state: dict) -> str:
+    ref = state.get("consultant_output") or {}
+    da_out = state.get("devils_advocate_output") or {}
+    parts = []
+    # Consultation note — the main human-readable report
+    note = ref.get("consultation_note", "")
+    if note:
+        paragraphs = _esc(note).split("\n")
+        formatted = "".join(f"<p>{p.strip()}</p>" for p in paragraphs if p.strip())
+        parts.append(f'<div class="consultation-note">{formatted}</div>')
+    # Alternative diagnoses to consider
+    alt_raw = ref.get("alternative_diagnoses", "")
+    if alt_raw:
+        try:
+            alts = json.loads(alt_raw) if isinstance(alt_raw, str) else alt_raw
+            if not isinstance(alts, list):
+                alts = []
+            if alts:
+                items = []
+                for a in alts:
+                    urgency_raw = str(a.get("urgency", "")).strip().lower()
+                    urgency = urgency_raw if urgency_raw in {"critical", "high", "moderate"} else "moderate"
+                    urgency_label = urgency.upper()
+                    dx = _esc(a.get("diagnosis", "?"))
+                    ev = _esc(a.get("evidence", ""))
+                    ns = _esc(a.get("next_step", ""))
+                    detail = f" — {ev}" if ev else ""
+                    step = f"<br><em>Next step: {ns}</em>" if ns else ""
+                    items.append(
+                        f'<li><span class="urgency-tag urgency-{urgency}">{urgency_label}</span> '
+                        f"<strong>{dx}</strong>{detail}{step}</li>"
+                    )
+                parts.append(f'<div class="alt-diagnoses"><strong>Consider</strong><ul>{"".join(items)}</ul></div>')
+        except (json.JSONDecodeError, TypeError):
+            pass
+    # Immediate actions (merged from Devil's Advocate + Consultant)
+    workup = da_out.get("recommended_workup", []) if isinstance(da_out, dict) else []
+    actions = ref.get("immediate_actions", [])
+    safe_workup = [str(x).strip() for x in workup if str(x).strip()]
+    safe_actions = [str(x).strip() for x in actions if str(x).strip()]
+    all_items = list(dict.fromkeys(safe_workup + safe_actions))
+    if all_items:
+        items = "".join(f"<li>{_esc(item)}</li>" for item in all_items)
+        parts.append(f'<div class="next-steps"><strong>Recommended Actions</strong><ul>{items}</ul></div>')
+    # Confidence note
+    if ref.get("confidence_note"):
+        parts.append(f'<div class="confidence-note"><em>{_esc(ref["confidence_note"])}</em></div>')
+    return "".join(parts)
+def transcribe_audio(audio, existing_context: str = ""):
+    """
+    Transcribe audio input using MedASR.
+    Generator that yields (context_text, status_html) for streaming UI feedback.
+    Appends transcribed text to any existing context.
+    """
+    def _status_html(cls: str, text: str) -> str:
+        return f'<div class="voice-status {cls}">{text}</div>'
+    if audio is None:
+        yield existing_context, _status_html("voice-idle", "No audio recorded. Click the microphone to start.")
+        return
+    if not ENABLE_MEDASR:
+        yield existing_context, _status_html("voice-error", "MedASR is disabled (set ENABLE_MEDASR=true)")
+        return
+    # Step 1: Show processing state
+    sr, audio_data = audio
+    duration = len(audio_data) / sr if sr > 0 else 0
+    yield existing_context, _status_html(
+        "voice-processing",
+        f'<span class="pulse-dot"></span> Transcribing {duration:.1f}s of audio with MedASR...'
+    )
+    try:
+        from models import medasr_client
+        # Convert to float32 mono
+        if audio_data.dtype != np.float32:
+            if np.issubdtype(audio_data.dtype, np.integer):
+                audio_data = audio_data.astype(np.float32) / np.iinfo(audio_data.dtype).max
+            else:
+                audio_data = audio_data.astype(np.float32)
+        if audio_data.ndim > 1:
+            audio_data = audio_data.mean(axis=1)
+        # Resample to 16kHz if needed (MedASR expects 16000Hz)
+        target_sr = 16000
+        if sr != target_sr:
+            from scipy.signal import resample
+            num_samples = int(len(audio_data) * target_sr / sr)
+            audio_data = resample(audio_data, num_samples).astype(np.float32)
+            sr = target_sr
+        # Step 2: Run transcription
+        text = medasr_client.transcribe(audio_data, sampling_rate=sr)
+        if not text.strip():
+            yield existing_context, _status_html("voice-error", "No speech detected. Please try again.")
+            return
+        # Step 3: Append to existing context
+        if existing_context.strip():
+            new_context = existing_context.rstrip() + "\n\n" + text
+        else:
+            new_context = text
+        word_count = len(text.split())
+        yield new_context, _status_html(
+            "voice-success",
+            f'✓ Transcribed {word_count} words ({duration:.1f}s) — text added to context above'
+        )
+    except Exception as e:
+        logger.exception("MedASR transcription failed")
+        yield existing_context, _status_html("voice-error", f"Transcription failed: {e}")
+def load_demo(demo_name: str | None):
+    """Load a demo case into the UI inputs."""
+    if demo_name is None or demo_name not in DEMO_CASES:
+        return None, "", "", "CXR"
+    case = DEMO_CASES[demo_name]
+    image_path = os.path.join(DEMO_CASES_DIR, case["image_file"])
+    image = None
+    if os.path.exists(image_path):
+        image = Image.open(image_path)
+    else:
+        logger.warning("Demo image not found: %s", image_path)
+    modality = case.get("modality") or "CXR"
+    return image, case["diagnosis"], case["context"], modality

ui/components.py ADDED Viewed

	@@ -0,0 +1,317 @@

+"""Gradio UI layout for Diagnostic Devil's Advocate."""
+import gradio as gr
+from config import ENABLE_MEDASR
+def build_ui(analyze_fn, load_demo_fn, transcribe_fn=None):
+    """
+    Build the Gradio Blocks UI.
+    Args:
+        analyze_fn: generator(image, diagnosis, context, modality) -> yields HTML
+        load_demo_fn: callback(demo_name) -> (image, diagnosis, context, modality)
+        transcribe_fn: callback(audio, existing_context) -> yields (context, status_html) (optional)
+    """
+    with gr.Blocks(title="Diagnostic Devil's Advocate") as demo:
+        # ── Hero Banner ──
+        gr.HTML(
+            """
+            <div class="hero-banner">
+                <div class="hero-badge">MedGemma Impact Challenge</div>
+                <h1>Diagnostic Devil's Advocate</h1>
+                <p class="hero-sub">AI-Powered Cognitive Debiasing for Medical Image Interpretation</p>
+                <p class="hero-desc">Upload a medical image with the working diagnosis.
+                   Four AI agents will independently analyze it, detect cognitive biases,
+                   challenge the diagnosis, and synthesize a debiasing report.</p>
+                <div class="hero-models">
+                    <span class="model-chip">MedGemma 4B</span>
+                    <span class="model-chip">MedSigLIP</span>
+                    <span class="model-chip">LangGraph</span>
+                    <span class="model-chip">MedASR</span>
+                </div>
+            </div>
+            """
+        )
+        # ── Demo Cases Row (3 clickable cards) ──
+        gr.HTML('<div class="section-label">SELECT A DEMO CASE</div>')
+        with gr.Row(elem_classes=["case-row"]):
+            demo_btn_1 = gr.Button(
+                value="",
+                elem_id="case-btn-1",
+                elem_classes=["case-card-btn"],
+            )
+            demo_btn_2 = gr.Button(
+                value="",
+                elem_id="case-btn-2",
+                elem_classes=["case-card-btn"],
+            )
+            demo_btn_3 = gr.Button(
+                value="",
+                elem_id="case-btn-3",
+                elem_classes=["case-card-btn"],
+            )
+        # Overlay HTML on top of buttons for card visuals
+        gr.HTML("""
+        <div class="case-cards-overlay">
+            <div class="case-card case-card-pneumo" onclick="document.querySelector('#case-btn-1').click()">
+                <div class="card-top">
+                    <span class="case-icon">🫁</span>
+                    <span class="case-tag tag-blue">TRAUMA</span>
+                </div>
+                <div class="case-title">Missed Pneumothorax</div>
+                <div class="case-meta">32-year-old Male</div>
+                <div class="case-desc">Motorcycle collision · Left chest pain · HR 104 · SpO₂ 96%</div>
+                <div class="case-misdiag">
+                    <span class="misdiag-label">Initial Dx:</span>
+                    <span class="misdiag-value">Rib contusion</span>
+                </div>
+            </div>
+            <div class="case-card case-card-aorta" onclick="document.querySelector('#case-btn-2').click()">
+                <div class="card-top">
+                    <span class="case-icon">🫀</span>
+                    <span class="case-tag tag-red">VASCULAR</span>
+                </div>
+                <div class="case-title">Aortic Dissection</div>
+                <div class="case-meta">58-year-old Male</div>
+                <div class="case-desc">Sudden chest→back pain · BP asymmetry 32mmHg · D-dimer 4850</div>
+                <div class="case-misdiag">
+                    <span class="misdiag-label">Initial Dx:</span>
+                    <span class="misdiag-value">GERD / Reflux</span>
+                </div>
+            </div>
+            <div class="case-card case-card-pe" onclick="document.querySelector('#case-btn-3').click()">
+                <div class="card-top">
+                    <span class="case-icon">🩸</span>
+                    <span class="case-tag tag-purple">POSTPARTUM</span>
+                </div>
+                <div class="case-title">Pulmonary Embolism</div>
+                <div class="case-meta">29-year-old Female</div>
+                <div class="case-desc">5 days post C-section · HR 118 · SpO₂ 91% · pO₂ 68</div>
+                <div class="case-misdiag">
+                    <span class="misdiag-label">Initial Dx:</span>
+                    <span class="misdiag-value">Postpartum anxiety</span>
+                </div>
+            </div>
+        </div>
+        """)
+        # ── Main Content: Input + Output ──
+        with gr.Row(equal_height=False):
+            # ═══════════ Left Column: Input ═══════════
+            with gr.Column(scale=4, min_width=340):
+                gr.HTML('<div class="section-label">CLINICAL INPUT</div>')
+                image_input = gr.Image(
+                    type="pil",
+                    label="Medical Image",
+                    height=240,
+                )
+                modality_input = gr.Radio(
+                    choices=["CXR", "CT", "Other"],
+                    value="CXR",
+                    label="Imaging Modality",
+                )
+                diagnosis_input = gr.Textbox(
+                    label="Doctor's Working Diagnosis",
+                    placeholder="e.g., Left rib contusion with musculoskeletal chest wall pain",
+                )
+                context_input = gr.Textbox(
+                    label="Clinical Context (history, vitals, labs, exam)",
+                    placeholder=(
+                        "e.g., 32M, motorcycle accident, left-sided chest pain, "
+                        "HR 104, SpO2 96%, WBC 11.2..."
+                    ),
+                    lines=5,
+                )
+                # ── Voice Input (MedASR) ──
+                if ENABLE_MEDASR and transcribe_fn:
+                    gr.HTML("""
+                    <div class="voice-section">
+                        <div class="voice-header">
+                            <span class="voice-icon">🎙️</span>
+                            <span class="voice-title">Voice Input</span>
+                            <span class="voice-badge">MedASR</span>
+                        </div>
+                        <div class="voice-hint">Record clinical context with your microphone.
+                            Text will be appended to the context field above.</div>
+                    </div>
+                    """)
+                    with gr.Row(elem_classes=["voice-row"]):
+                        audio_input = gr.Audio(
+                            sources=["microphone"],
+                            type="numpy",
+                            label="",
+                            show_label=False,
+                            elem_classes=["voice-audio"],
+                        )
+                        with gr.Column(scale=1, min_width=160):
+                            transcribe_btn = gr.Button(
+                                "Transcribe",
+                                size="sm",
+                                elem_classes=["transcribe-btn"],
+                            )
+                            voice_status = gr.HTML(
+                                value='<div class="voice-status voice-idle">Ready to record</div>',
+                            )
+                else:
+                    gr.HTML(
+                        '<div class="voice-status voice-idle">Voice input disabled (MedASR)</div>'
+                    )
+                analyze_btn = gr.Button(
+                    "Analyze & Challenge Diagnosis",
+                    variant="primary",
+                    size="lg",
+                    elem_classes=["analyze-btn"],
+                )
+            # ═══════════ Right Column: Pipeline Output ═══════════
+            with gr.Column(scale=6, min_width=500):
+                gr.HTML('<div class="section-label">PIPELINE OUTPUT</div>')
+                pipeline_output = gr.HTML(
+                    value=_initial_progress_html(),
+                )
+        # ── Footer ──
+        gr.HTML(
+            """
+            <div class="footer-text">
+                <span>Built with</span>
+                <span class="footer-chip">MedGemma</span>
+                <span class="footer-chip">MedSigLIP</span>
+                <span class="footer-chip">LangGraph</span>
+                <span class="footer-chip">Gradio</span>
+                <span class="footer-sep">|</span>
+                <span>MedGemma Impact Challenge 2025</span>
+                <span class="footer-sep">|</span>
+                <span>Research & educational use only</span>
+            </div>
+            """
+        )
+        # ═══════════ Wire Callbacks ═══════════
+        analyze_btn.click(
+            fn=analyze_fn,
+            inputs=[image_input, diagnosis_input, context_input, modality_input],
+            outputs=[pipeline_output],
+        )
+        demo_btn_1.click(
+            fn=lambda: load_demo_fn("Case 1: Missed Pneumothorax"),
+            inputs=[],
+            outputs=[image_input, diagnosis_input, context_input, modality_input],
+        )
+        demo_btn_2.click(
+            fn=lambda: load_demo_fn("Case 2: Aortic Dissection"),
+            inputs=[],
+            outputs=[image_input, diagnosis_input, context_input, modality_input],
+        )
+        demo_btn_3.click(
+            fn=lambda: load_demo_fn("Case 3: Pulmonary Embolism"),
+            inputs=[],
+            outputs=[image_input, diagnosis_input, context_input, modality_input],
+        )
+        # Voice transcription — outputs to context field + status indicator
+        if ENABLE_MEDASR and transcribe_fn:
+            transcribe_btn.click(
+                fn=transcribe_fn,
+                inputs=[audio_input, context_input],
+                outputs=[context_input, voice_status],
+            )
+    return demo
+def _initial_progress_html() -> str:
+    """Static initial progress bar HTML."""
+    return _build_progress_html([], None, None, {})
+def _build_progress_html(
+    completed: list[str],
+    active: str | None,
+    error: str | None,
+    agent_outputs: dict[str, str] | None = None,
+) -> str:
+    """Build pipeline output: progress bar + each agent's result inline.
+    Args:
+        agent_outputs: {agent_id: html_content} for completed agents.
+    """
+    if agent_outputs is None:
+        agent_outputs = {}
+    agents = [
+        ("diagnostician", "Diagnostician", "Independent image analysis"),
+        ("bias_detector", "Bias Detector", "Cognitive bias identification"),
+        ("devil_advocate", "Devil's Advocate", "Adversarial challenge"),
+        ("consultant", "Consultant", "Consultation synthesis"),
+    ]
+    n_done = len(completed)
+    pct = int(n_done / len(agents) * 100)
+    bar_color = "#ef4444" if error else "#3b82f6"
+    html = f"""
+    <div class="progress-container">
+        <div class="progress-bar-track">
+            <div class="progress-bar-fill" style="width:{pct}%;background:{bar_color};">
+                {pct}%
+            </div>
+        </div>
+        <div class="pipeline-agents">
+    """
+    for agent_id, name, desc in agents:
+        if agent_id == error:
+            cls = "step-error"
+            icon = "✗"
+            status = "Failed"
+        elif agent_id in completed:
+            cls = "step-done"
+            icon = "✓"
+            status = "Complete"
+        elif agent_id == active:
+            cls = "step-active"
+            icon = "⟳"
+            status = desc
+        else:
+            cls = "step-waiting"
+            icon = "○"
+            status = "Waiting"
+        content = agent_outputs.get(agent_id, "")
+        if content:
+            # Collapsible: <details open> with header as <summary>
+            html += f"""
+        <details class="agent-block {cls}" open>
+            <summary class="agent-header">
+                <span class="step-icon">{icon}</span>
+                <span class="step-name">{name}</span>
+                <span class="step-status">{status}</span>
+            </summary>
+            <div class="agent-output">{content}</div>
+        </details>"""
+        else:
+            # No output yet — just show the header (not collapsible)
+            html += f"""
+        <div class="agent-block {cls}">
+            <div class="agent-header">
+                <span class="step-icon">{icon}</span>
+                <span class="step-name">{name}</span>
+                <span class="step-status">{status}</span>
+            </div>
+        </div>"""
+    html += "</div></div>"
+    return html

ui/css.py ADDED Viewed

	@@ -0,0 +1,726 @@

+"""Custom CSS for the Diagnostic Devil's Advocate UI."""
+CUSTOM_CSS = """
+/* ===== Global ===== */
+.gradio-container {
+    max-width: 1320px !important;
+    margin: 0 auto !important;
+    font-family: 'Inter', 'Segoe UI', system-ui, -apple-system, sans-serif !important;
+    width: 100% !important;
+    box-sizing: border-box !important;
+    overflow-x: hidden;
+}
+/* ===== Hero Banner ===== */
+.hero-banner {
+    text-align: center;
+    padding: 36px 28px 24px;
+    border-radius: 16px;
+    background: linear-gradient(135deg, #0f172a 0%, #1e293b 40%, #0f3460 100%);
+    color: #fff;
+    margin-bottom: 22px;
+    box-shadow: 0 4px 28px rgba(0, 0, 0, 0.2);
+    position: relative;
+    overflow: hidden;
+}
+.hero-banner::after {
+    content: '';
+    position: absolute;
+    top: -50%;
+    right: -20%;
+    width: 400px;
+    height: 400px;
+    background: radial-gradient(circle, rgba(59,130,246,0.08) 0%, transparent 70%);
+    pointer-events: none;
+}
+.hero-badge {
+    display: inline-block;
+    padding: 4px 14px;
+    border-radius: 20px;
+    background: rgba(59, 130, 246, 0.2);
+    border: 1px solid rgba(59, 130, 246, 0.3);
+    color: #93c5fd;
+    font-size: 0.72rem;
+    font-weight: 600;
+    letter-spacing: 0.5px;
+    text-transform: uppercase;
+    margin-bottom: 12px;
+}
+.hero-banner h1 {
+    margin: 0 0 8px;
+    font-size: 2.2rem;
+    font-weight: 800;
+    letter-spacing: -0.8px;
+    color: #fff !important;
+    border: none !important;
+}
+.hero-sub {
+    margin: 0 0 6px;
+    font-size: 1.02rem;
+    color: #93c5fd;
+    font-weight: 500;
+}
+.hero-desc {
+    margin: 0 auto;
+    font-size: 0.86rem;
+    color: #94a3b8;
+    max-width: 660px;
+    line-height: 1.55;
+}
+.hero-models {
+    margin-top: 16px;
+    display: flex;
+    justify-content: center;
+    gap: 8px;
+    flex-wrap: wrap;
+}
+.model-chip {
+    display: inline-block;
+    padding: 3px 12px;
+    border-radius: 6px;
+    background: rgba(255,255,255,0.08);
+    border: 1px solid rgba(255,255,255,0.12);
+    color: #cbd5e1;
+    font-size: 0.72rem;
+    font-weight: 600;
+}
+/* ===== Section Label ===== */
+.section-label {
+    font-size: 0.7rem;
+    font-weight: 700;
+    color: #64748b;
+    letter-spacing: 1.2px;
+    text-transform: uppercase;
+    margin: 16px 0 10px;
+    padding-left: 2px;
+}
+/* ===== Demo Case: Hidden buttons ===== */
+.case-row {
+    height: 0 !important;
+    overflow: hidden !important;
+    margin: 0 !important;
+    padding: 0 !important;
+    gap: 0 !important;
+}
+.case-card-btn {
+    opacity: 0 !important;
+    position: absolute !important;
+    pointer-events: none !important;
+    height: 0 !important;
+    padding: 0 !important;
+    margin: 0 !important;
+}
+/* ===== Demo Case Cards (overlay, clickable) ===== */
+.case-cards-overlay {
+    display: grid;
+    grid-template-columns: 1fr 1fr 1fr;
+    gap: 14px;
+    margin-bottom: 20px;
+}
+.case-card {
+    border-radius: 14px;
+    padding: 18px 16px 14px;
+    background: #fff;
+    border: 1.5px solid #e2e8f0;
+    cursor: pointer;
+    transition: all 0.22s ease;
+    position: relative;
+    overflow: hidden;
+}
+.case-card:hover {
+    border-color: #93c5fd;
+    box-shadow: 0 6px 24px rgba(59, 130, 246, 0.12);
+    transform: translateY(-3px);
+}
+.case-card:active {
+    transform: translateY(0);
+    box-shadow: 0 2px 8px rgba(59, 130, 246, 0.15);
+}
+.case-card::before {
+    content: '';
+    position: absolute;
+    top: 0;
+    left: 0;
+    right: 0;
+    height: 4px;
+}
+.case-card-pneumo::before { background: linear-gradient(90deg, #3b82f6, #60a5fa); }
+.case-card-aorta::before  { background: linear-gradient(90deg, #ef4444, #f87171); }
+.case-card-pe::before     { background: linear-gradient(90deg, #8b5cf6, #a78bfa); }
+.card-top {
+    display: flex;
+    align-items: center;
+    justify-content: space-between;
+    margin-bottom: 10px;
+}
+.case-icon { font-size: 1.8rem; }
+.case-tag {
+    font-size: 0.6rem;
+    font-weight: 700;
+    letter-spacing: 0.8px;
+    padding: 3px 8px;
+    border-radius: 5px;
+    text-transform: uppercase;
+}
+.tag-blue   { background: #dbeafe; color: #1d4ed8; }
+.tag-red    { background: #fee2e2; color: #b91c1c; }
+.tag-purple { background: #f3e8ff; color: #7c3aed; }
+.case-title {
+    font-size: 1rem;
+    font-weight: 700;
+    color: #1e293b;
+    margin-bottom: 2px;
+}
+.case-meta {
+    font-size: 0.78rem;
+    color: #64748b;
+    margin-bottom: 6px;
+}
+.case-desc {
+    font-size: 0.74rem;
+    color: #94a3b8;
+    line-height: 1.4;
+    margin-bottom: 10px;
+}
+.case-misdiag {
+    padding-top: 8px;
+    border-top: 1px solid #f1f5f9;
+    font-size: 0.74rem;
+}
+.misdiag-label {
+    color: #94a3b8;
+}
+.misdiag-value {
+    color: #dc2626;
+    font-weight: 700;
+}
+/* ===== Voice Input Section ===== */
+.voice-section {
+    margin-top: 12px;
+    padding: 12px 14px 8px;
+    background: linear-gradient(135deg, #fafbff 0%, #f0f4ff 100%);
+    border: 1.5px solid #dbeafe;
+    border-radius: 12px 12px 0 0;
+    border-bottom: none;
+}
+.voice-header {
+    display: flex;
+    align-items: center;
+    gap: 8px;
+    margin-bottom: 4px;
+}
+.voice-icon { font-size: 1.2rem; }
+.voice-title {
+    font-size: 0.88rem;
+    font-weight: 700;
+    color: #1e293b;
+}
+.voice-badge {
+    font-size: 0.6rem;
+    font-weight: 700;
+    padding: 2px 8px;
+    border-radius: 4px;
+    background: #dbeafe;
+    color: #1d4ed8;
+    letter-spacing: 0.5px;
+    text-transform: uppercase;
+}
+.voice-hint {
+    font-size: 0.74rem;
+    color: #94a3b8;
+    line-height: 1.4;
+}
+.voice-row {
+    gap: 10px !important;
+    align-items: stretch !important;
+    margin-bottom: 6px !important;
+}
+.voice-audio {
+    border-radius: 0 0 0 12px !important;
+}
+.transcribe-btn {
+    border-radius: 8px !important;
+    font-weight: 600 !important;
+    border: 1.5px solid #3b82f6 !important;
+    background: #eff6ff !important;
+    color: #1d4ed8 !important;
+    transition: all 0.15s ease !important;
+}
+.transcribe-btn:hover {
+    background: #dbeafe !important;
+    box-shadow: 0 2px 8px rgba(59, 130, 246, 0.15) !important;
+}
+/* Voice status indicators */
+.voice-status {
+    font-size: 0.76rem;
+    padding: 6px 10px;
+    border-radius: 8px;
+    margin-top: 6px;
+    text-align: center;
+    line-height: 1.4;
+}
+.voice-idle {
+    background: #f8fafc;
+    color: #94a3b8;
+    border: 1px solid #e2e8f0;
+}
+.voice-processing {
+    background: #eff6ff;
+    color: #1d4ed8;
+    border: 1px solid #bfdbfe;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    gap: 6px;
+}
+.voice-success {
+    background: #f0fdf4;
+    color: #166534;
+    border: 1px solid #bbf7d0;
+    font-weight: 600;
+}
+.voice-error {
+    background: #fef2f2;
+    color: #991b1b;
+    border: 1px solid #fecaca;
+}
+/* Pulsing dot for processing state */
+.pulse-dot {
+    display: inline-block;
+    width: 8px;
+    height: 8px;
+    border-radius: 50%;
+    background: #3b82f6;
+    animation: pulse-dot-anim 1s ease-in-out infinite;
+}
+@keyframes pulse-dot-anim {
+    0%, 100% { opacity: 1; transform: scale(1); }
+    50% { opacity: 0.4; transform: scale(0.7); }
+}
+/* ===== Analyze Button ===== */
+.analyze-btn {
+    width: 100% !important;
+    border-radius: 12px !important;
+    font-size: 1.05rem !important;
+    font-weight: 700 !important;
+    padding: 14px !important;
+    background: linear-gradient(135deg, #2563eb 0%, #1d4ed8 100%) !important;
+    box-shadow: 0 4px 14px rgba(37, 99, 235, 0.3) !important;
+    transition: all 0.2s ease !important;
+    margin-top: 10px !important;
+    letter-spacing: -0.2px !important;
+}
+.analyze-btn:hover {
+    box-shadow: 0 6px 24px rgba(37, 99, 235, 0.4) !important;
+    transform: translateY(-2px) !important;
+}
+/* ===== Progress Bar ===== */
+.progress-container {
+    margin: 4px 0 8px;
+}
+.progress-bar-track {
+    height: 28px;
+    background: #f1f5f9;
+    border-radius: 14px;
+    overflow: hidden;
+    margin-bottom: 14px;
+    border: 1px solid #e2e8f0;
+}
+.progress-bar-fill {
+    height: 100%;
+    border-radius: 14px;
+    color: #fff;
+    font-size: 0.75rem;
+    font-weight: 700;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    transition: width 0.6s ease;
+    min-width: 0;
+}
+/* ===== Pipeline Agent Blocks ===== */
+.pipeline-agents {
+    display: flex;
+    flex-direction: column;
+    gap: 8px;
+}
+details.agent-block,
+div.agent-block {
+    border-radius: 12px;
+    border: 1px solid transparent;
+    box-sizing: border-box;
+    width: 100%;
+    min-width: 0;
+    overflow-x: hidden;
+}
+details.agent-block > summary {
+    list-style: none;
+    cursor: pointer;
+    user-select: none;
+}
+details.agent-block > summary::-webkit-details-marker {
+    display: none;
+}
+details.agent-block > summary::after {
+    content: '▾';
+    font-size: 0.7rem;
+    color: #94a3b8;
+    margin-left: 6px;
+    transition: transform 0.2s ease;
+}
+details.agent-block:not([open]) > summary::after {
+    transform: rotate(-90deg);
+}
+.agent-header {
+    display: flex;
+    align-items: center;
+    gap: 10px;
+    padding: 10px 14px;
+}
+div.agent-block .agent-header {
+    cursor: default;
+}
+.step-icon {
+    width: 24px;
+    height: 24px;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    border-radius: 50%;
+    font-size: 0.82rem;
+    font-weight: 700;
+    flex-shrink: 0;
+}
+.step-name {
+    font-size: 0.88rem;
+    font-weight: 700;
+}
+.step-status {
+    font-size: 0.72rem;
+    margin-left: auto;
+}
+/* Agent output area */
+.agent-output {
+    padding: 4px 14px 14px;
+    font-size: 0.84rem;
+    line-height: 1.6;
+    color: #334155;
+    border-top: 1px solid rgba(0,0,0,0.06);
+    overflow: hidden;
+    overflow-wrap: break-word;
+    overflow-wrap: anywhere;
+    word-break: break-word;
+}
+.agent-output * {
+    max-width: 100%;
+    box-sizing: border-box;
+}
+.agent-output pre,
+.agent-output code {
+    max-width: 100%;
+    white-space: pre-wrap;
+    word-break: break-word;
+}
+.agent-output ul {
+    margin: 6px 0;
+    padding-left: 18px;
+}
+.agent-output li {
+    margin-bottom: 4px;
+}
+.agent-text {
+    overflow-wrap: break-word;
+    word-break: break-word;
+}
+.agent-output details summary {
+    cursor: pointer;
+    color: #475569;
+}
+.findings-section,
+.differentials-section {
+    margin-bottom: 6px;
+}
+.findings-section strong,
+.differentials-section strong {
+    color: #1e293b;
+}
+.differentials-section ol {
+    margin: 6px 0;
+    padding-left: 22px;
+}
+/* Step states */
+.step-done {
+    background: #f0fdf4;
+    border-color: #bbf7d0;
+}
+.step-done .step-icon {
+    background: #22c55e;
+    color: #fff;
+}
+.step-done .step-name { color: #166534; }
+.step-done .step-status { color: #4ade80; }
+.step-active {
+    background: #eff6ff;
+    border-color: #bfdbfe;
+    animation: pulse-border 2s ease-in-out infinite;
+}
+.step-active .step-icon {
+    background: #3b82f6;
+    color: #fff;
+    animation: spin 1.2s linear infinite;
+}
+.step-active .step-name { color: #1d4ed8; }
+.step-active .step-status { color: #60a5fa; }
+.step-waiting {
+    background: #f8fafc;
+    border-color: #f1f5f9;
+}
+.step-waiting .step-icon {
+    background: #e2e8f0;
+    color: #94a3b8;
+}
+.step-waiting .step-name { color: #94a3b8; }
+.step-waiting .step-status { color: #cbd5e1; }
+.step-error {
+    background: #fef2f2;
+    border-color: #fecaca;
+}
+.step-error .step-icon {
+    background: #ef4444;
+    color: #fff;
+}
+.step-error .step-name { color: #991b1b; }
+.step-error .step-status { color: #f87171; }
+@keyframes pulse-border {
+    0%, 100% { border-color: #bfdbfe; }
+    50% { border-color: #60a5fa; }
+}
+@keyframes spin {
+    from { transform: rotate(0deg); }
+    to { transform: rotate(360deg); }
+}
+/* ===== Agent Output Styling ===== */
+/* Bias Detector */
+.discrepancy-summary {
+    padding: 8px 12px;
+    margin-bottom: 10px;
+    background: #fff7ed;
+    border-radius: 8px;
+    border: 1px solid #fed7aa;
+    color: #9a3412;
+    font-size: 0.84rem;
+    line-height: 1.5;
+}
+.bias-item {
+    margin-bottom: 10px;
+    padding: 8px 12px;
+    background: #fffbeb;
+    border-left: 3px solid #f59e0b;
+    border-radius: 0 8px 8px 0;
+}
+.bias-title {
+    font-weight: 700;
+    color: #92400e;
+    margin-bottom: 4px;
+}
+.bias-evidence {
+    color: #78716c;
+    font-size: 0.82rem;
+}
+.severity-tag {
+    display: inline-block;
+    padding: 1px 6px;
+    border-radius: 4px;
+    font-size: 0.65rem;
+    font-weight: 800;
+    letter-spacing: 0.5px;
+    margin-right: 4px;
+    vertical-align: middle;
+}
+.severity-high { background: #fee2e2; color: #dc2626; }
+.severity-medium { background: #fff7ed; color: #ea580c; }
+.severity-low { background: #fefce8; color: #ca8a04; }
+.source-tag {
+    display: inline-block;
+    padding: 1px 6px;
+    border-radius: 4px;
+    font-size: 0.65rem;
+    font-weight: 800;
+    letter-spacing: 0.5px;
+    margin-right: 4px;
+    vertical-align: middle;
+}
+.source-doctor { background: #dbeafe; color: #1d4ed8; }
+.source-ai { background: #ede9fe; color: #7c3aed; }
+.source-both { background: #e0e7ff; color: #4338ca; }
+.source-imaging { background: #dbeafe; color: #1d4ed8; }
+.source-clinical { background: #fef3c7; color: #b45309; }
+.missed-findings {
+    margin-top: 8px;
+    padding: 8px 12px;
+    background: #fef2f2;
+    border-radius: 8px;
+}
+.missed-findings strong {
+    color: #991b1b;
+}
+/* SigLIP */
+.siglip-section {
+    margin-top: 8px;
+    padding: 8px 12px;
+    background: #f0f9ff;
+    border-radius: 8px;
+}
+.siglip-section strong {
+    color: #0369a1;
+}
+.sign-present {
+    color: #166534;
+}
+.sign-absent {
+    color: #94a3b8;
+}
+/* Devil's Advocate */
+.mnm-item {
+    margin-bottom: 10px;
+    padding: 8px 12px;
+    background: #fef2f2;
+    border-left: 3px solid #ef4444;
+    border-radius: 0 8px 8px 0;
+}
+.mnm-title {
+    font-weight: 700;
+    color: #991b1b;
+    font-size: 0.92rem;
+}
+.mnm-item ul {
+    margin-top: 4px;
+}
+.challenge-item {
+    margin-bottom: 8px;
+    padding: 8px 12px;
+    background: #faf5ff;
+    border-left: 3px solid #8b5cf6;
+    border-radius: 0 8px 8px 0;
+}
+.challenge-claim {
+    font-weight: 700;
+    color: #5b21b6;
+    margin-bottom: 2px;
+}
+.challenge-counter {
+    color: #6b7280;
+    font-size: 0.82rem;
+}
+/* Consultant */
+.consultation-note {
+    padding: 12px 16px;
+    background: linear-gradient(135deg, #f8fafc, #f0f9ff);
+    border-radius: 10px;
+    border: 1px solid #e2e8f0;
+    color: #1e293b;
+    font-size: 0.86rem;
+    line-height: 1.65;
+    margin-bottom: 10px;
+}
+.consultation-note p {
+    margin: 0 0 8px;
+}
+.consultation-note p:last-child {
+    margin-bottom: 0;
+}
+.alt-diagnoses {
+    margin-bottom: 8px;
+}
+.urgency-tag {
+    display: inline-block;
+    padding: 1px 6px;
+    border-radius: 4px;
+    font-size: 0.65rem;
+    font-weight: 800;
+    letter-spacing: 0.5px;
+    vertical-align: middle;
+}
+.urgency-critical { background: #fee2e2; color: #dc2626; }
+.urgency-high { background: #fff7ed; color: #ea580c; }
+.urgency-moderate { background: #fefce8; color: #ca8a04; }
+.urgency-unknown { background: #e2e8f0; color: #475569; }
+.next-steps {
+    margin-bottom: 8px;
+}
+.confidence-note {
+    padding: 8px 12px;
+    background: #f8fafc;
+    border-radius: 8px;
+    color: #64748b;
+    font-size: 0.8rem;
+}
+/* Pipeline error */
+.pipeline-error {
+    padding: 14px;
+    background: #fef2f2;
+    border: 1px solid #fecaca;
+    border-radius: 10px;
+    color: #991b1b;
+    font-weight: 600;
+}
+/* ===== Footer ===== */
+.footer-text {
+    text-align: center;
+    padding: 18px;
+    margin-top: 24px;
+    font-size: 0.76rem;
+    color: #94a3b8;
+    border-top: 1px solid #e2e8f0;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    gap: 6px;
+    flex-wrap: wrap;
+}
+.footer-chip {
+    display: inline-block;
+    padding: 1px 8px;
+    border-radius: 4px;
+    background: #f1f5f9;
+    color: #475569;
+    font-weight: 600;
+    font-size: 0.72rem;
+}
+.footer-sep {
+    color: #cbd5e1;
+    margin: 0 2px;
+}
+/* ===== Responsive ===== */
+@media (max-width: 768px) {
+    .gradio-container { max-width: 100% !important; }
+    .hero-banner h1 { font-size: 1.5rem; }
+    .case-cards-overlay { grid-template-columns: 1fr; }
+}
+"""