Spaces:
Running
on
Zero
Running
on
Zero
File size: 10,353 Bytes
c0fff99 550f866 c0fff99 fd400e1 c0fff99 1c7953f c0fff99 61e710d c0fff99 61e710d c0fff99 935ec7f 5d22d0d 935ec7f c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 2544091 550f866 be83bd3 2544091 c0fff99 61e710d c0fff99 61e710d 955d204 70ea87e 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d 1c7953f c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 1c7953f c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d c0fff99 61e710d 1c7953f 61e710d c0fff99 61e710d c0fff99 195dabb 61e710d c0fff99 61e710d c0fff99 955d204 c0fff99 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 |
---
title: Diagnostic Devil's Advocate
emoji: "π©Ί"
colorFrom: red
colorTo: blue
sdk: gradio
sdk_version: "6.4.0"
app_file: app.py
pinned: false
license: cc-by-4.0
tags:
- medgemma
- medical-imaging
- multi-agent
- cognitive-bias
- radiology
---
<div align="center">
# Diagnostic Devil's Advocate
**AI-Powered Cognitive Debiasing for Medical Image Interpretation**
[](https://huggingface.co/spaces/yipengsun/diagnostic-devils-advocate)
[](https://huggingface.co/google/medgemma-1.5-4b-it)
[](https://huggingface.co/google/medsiglip-448)
[](https://langchain-ai.github.io/langgraph/)
[](https://gradio.app)
[](LICENSE)
</div>
---
## Why This Exists
> Diagnostic errors affect an estimated **12 million** adults annually in the U.S., with cognitive biases implicated in up to **74%** of cases. ([Singh et al., 2014](https://pubmed.ncbi.nlm.nih.gov/24742777/))
Doctors are not wrong because they lack knowledge -- they are wrong because the human brain takes shortcuts. A physician who sees "young patient + chest pain after trauma" anchors on **rib contusion** and stops looking. The pneumothorax goes unseen. The patient deteriorates.
**Diagnostic Devil's Advocate** acts as an adversarial second opinion. It does not replace the physician -- it challenges them: *"Have you considered what happens if you're wrong?"*
---
## Pipeline
Four agents, each with a distinct adversarial role, orchestrated by [LangGraph](https://langchain-ai.github.io/langgraph/) as a linear `StateGraph`:
<div align="center">
<img src="assets/workflow.jpg" alt="Workflow Diagram" width="100%">
<br>
<em>Figure 1: Multi-agent pipeline for cognitive debiasing in medical image interpretation.</em>
<br>
<sub>Diagram generated with <a href="https://gemini.google/overview/image-generation/">Nano Banana Pro</a></sub>
</div>
### Key Design Choices
- **Blinded first agent** -- the Diagnostician never sees the doctor's diagnosis, preventing the AI from anchoring on the same conclusion
- **Dual-source analysis** -- every agent considers both the medical image and clinical context (vitals, labs, risk factors), because many dangerous conditions have subtle imaging but obvious clinical red flags
- **MedSigLIP verification** -- zero-shot image classification grounds the bias analysis in visual evidence, not just language reasoning
- **MedASR voice input** -- [MedASR](https://huggingface.co/google/medasr) enables hands-free clinical context entry via speech-to-text, designed for busy clinical workflows where typing is impractical
- **Prompt repetition** -- implements the [prompt repetition technique](https://arxiv.org/abs/2512.14982) from Google Research to improve output quality and consistency in non-reasoning LLMs
- **Collegial tone** -- the Consultant writes as a consulting colleague (*"Have you considered..."*), not a critic
---
## Model Stack
| Model | Params | Role | VRAM |
|:------|:------:|:-----|:----:|
| [MedGemma 1.5 4B-IT](https://huggingface.co/google/medgemma-1.5-4b-it) | 4B | Multimodal image + text analysis | ~4 GB (4-bit) |
| [MedGemma 27B Text-IT](https://huggingface.co/google/medgemma-27b-text-it) | 27B | Consultant deep reasoning (optional) | ~54 GB |
| [MedSigLIP-448](https://huggingface.co/google/medsiglip-448) | 0.9B | Zero-shot sign verification | ~3 GB |
| [MedASR](https://huggingface.co/google/medasr) | 105M | Medical speech-to-text | ~0.5 GB |
The full pipeline requires **~8 GB VRAM** and runs on any 12 GB+ CUDA GPU. All models load locally via [Transformers](https://huggingface.co/docs/transformers) with [4-bit quantization](https://huggingface.co/docs/bitsandbytes) -- **zero API costs, fully offline-capable**.
---
## Getting Started
```bash
# Clone
git clone https://github.com/sypsyp97/diagnostic-devils-advocate
cd diagnostic-devils-advocate
# Install
pip install -r requirements.txt
# Login to Hugging Face (gated models)
huggingface-cli login
# Run
python app.py # 4B quantized (default)
USE_27B=true QUANTIZE_4B=false python app.py # with 27B Consultant
ENABLE_MEDASR=false python app.py # without voice input
```
The app launches at `http://localhost:7860`.
<details>
<summary><b>Environment Variables</b></summary>
| Variable | Default | Description |
|:---------|:--------|:------------|
| `USE_27B` | `false` | Enable 27B model for the Consultant agent |
| `QUANTIZE_4B` | `true` | 4-bit quantize the 4B model |
| `ENABLE_MEDASR` | `true` | Enable voice input via MedASR |
| `HF_TOKEN` | -- | Hugging Face token (or use `huggingface-cli login`) |
| `ENABLE_PROMPT_REPETITION` | `true` | [Prompt repetition](https://arxiv.org/abs/2512.14982) for improved output quality |
| `MODEL_LOCAL_DIR` | -- | Local directory for pre-downloaded models |
| `DEVICE` | `cuda` | Compute device |
</details>
<details>
<summary><b>Project Structure</b></summary>
```
diagnostic-devils-advocate/
βββ app.py # Gradio entry point
βββ config.py # Model & environment config
βββ requirements.txt
βββ agents/
β βββ prompts.py # All agent prompt templates
β βββ graph.py # LangGraph StateGraph pipeline
β βββ output_parser.py # JSON parsing (json_repair + llm-output-parser)
β βββ diagnostician.py # Agent 1: Blinded analysis
β βββ bias_detector.py # Agent 2: Bias detection + MedSigLIP
β βββ devil_advocate.py # Agent 3: Adversarial challenge
β βββ consultant.py # Agent 4: Consultation synthesis
βββ models/
β βββ medgemma_client.py # MedGemma 4B/27B inference
β βββ medsiglip_client.py # MedSigLIP zero-shot classification
β βββ medasr_client.py # MedASR speech-to-text
β βββ utils.py # Image preprocessing, token stripping
βββ ui/
β βββ components.py # Gradio layout
β βββ callbacks.py # UI event handlers
β βββ css.py # Custom styling
βββ data/
βββ demo_cases/ # Composite clinical scenarios
```
</details>
---
## Disclaimer
> **This is a research prototype built for the [MedGemma Impact Challenge](https://www.kaggle.com/competitions/med-gemma-impact-challenge). It is NOT intended for clinical decision-making.** All demo cases are educational composites. Medical images are sourced from the University of Saskatchewan Teaching Collection (CC-BY-NC-SA 4.0).
---
## References
<details>
<summary><b>Diagnostic Error & Cognitive Bias</b></summary>
- Singh H, Meyer AND, Thomas EJ. "The frequency of diagnostic errors in outpatient care: estimations from three large observational studies involving US adult populations." *BMJ Quality & Safety*, 2014;23(9):727--731. [doi:10.1136/bmjqs-2013-002627](https://pubmed.ncbi.nlm.nih.gov/24742777/)
- Croskerry P. "The importance of cognitive errors in diagnosis and strategies to minimize them." *Academic Medicine*, 2003;78(8):775--780. [doi:10.1097/00001888-200308000-00003](https://pubmed.ncbi.nlm.nih.gov/12915363/)
- Vally ZI, Khammissa RAG, Feller G, et al. "Errors in clinical diagnosis: a narrative review." *Journal of International Medical Research*, 2023;51(8):03000605231162798. [doi:10.1177/03000605231162798](https://pubmed.ncbi.nlm.nih.gov/37602466/)
- Staal J, Hooftman J, Gunput STG, et al. "Effect on diagnostic accuracy of cognitive reasoning tools for the workplace setting: systematic review and meta-analysis." *BMJ Quality & Safety*, 2022;31(12):899--910. [doi:10.1136/bmjqs-2022-014865](https://pubmed.ncbi.nlm.nih.gov/36396150/)
</details>
<details>
<summary><b>AI-Assisted Debiasing & Multi-Agent Systems</b></summary>
- Brown C, Nazeer R, Gibbs A, et al. "Breaking Bias: The Role of Artificial Intelligence in Improving Clinical Decision-Making." *Cureus*, 2023;15(3):e36415. [doi:10.7759/cureus.36415](https://pubmed.ncbi.nlm.nih.gov/37090406/)
- Tang X, Zou A, Zhang Z, et al. "MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning." *Findings of ACL*, 2024:599--621. [arXiv:2311.10537](https://arxiv.org/abs/2311.10537)
- Kim Y, Park C, Jeong H, et al. "MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making." *NeurIPS*, 2024. [arXiv:2404.15155](https://arxiv.org/abs/2404.15155)
- Chen X, Yi H, You M, et al. "Enhancing diagnostic capability with multi-agents conversational large language models." *npj Digital Medicine*, 2025;8:159. [doi:10.1038/s41746-025-01550-0](https://pubmed.ncbi.nlm.nih.gov/40082662/)
</details>
<details>
<summary><b>Medical Vision-Language Models & Prompt Engineering</b></summary>
- Jang J, Kyung D, Kim SH, et al. "Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders." *Scientific Reports*, 2024;14:23199. [doi:10.1038/s41598-024-73695-z](https://pubmed.ncbi.nlm.nih.gov/39369048/)
- Leviathan Y, Kalman M, Matias Y. "Prompt Repetition Improves Non-Reasoning LLMs." [arXiv:2512.14982](https://arxiv.org/abs/2512.14982), Google Research, 2025.
- Zaghir J, Naguib M, Bjelogrlic M, et al. "Prompt Engineering Paradigms for Medical Applications: Scoping Review." *Journal of Medical Internet Research*, 2024;26:e60501. [doi:10.2196/60501](https://pubmed.ncbi.nlm.nih.gov/39255030/)
- Sellergren A, Kazemzadeh S, Jaroensri T, et al. "MedGemma Technical Report." [arXiv:2507.05201](https://arxiv.org/abs/2507.05201), Google, 2025.
</details>
---
<div align="center">
Built with [Google Health AI Developer Foundations](https://developers.google.com/health-ai-developer-foundations) for the [MedGemma Impact Challenge](https://www.kaggle.com/competitions/med-gemma-impact-challenge)
</div>
|