Spaces:
Running on Zero
Running on Zero
Refactor code structure for improved readability and maintainability
Browse files
README.md
CHANGED
|
@@ -18,228 +18,169 @@ tags:
|
|
| 18 |
|
| 19 |
<div align="center">
|
| 20 |
|
| 21 |
-
#
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
[](https://gradio.app)
|
| 31 |
-
|
| 32 |
-
[Live Demo](#getting-started) • [Architecture](#architecture) • [Demo Cases](#demo-cases) • [Technical Details](#technical-details)
|
| 33 |
-
|
| 34 |
-
---
|
| 35 |
|
| 36 |
</div>
|
| 37 |
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
> *Diagnostic errors affect an estimated **12 million** adults annually in the U.S. alone, with cognitive biases β [anchoring](https://en.wikipedia.org/wiki/Anchoring_(cognitive_bias)), [premature closure](https://en.wikipedia.org/wiki/Premature_closure), [confirmation bias](https://en.wikipedia.org/wiki/Confirmation_bias) β implicated in up to **74%** of cases.* ([Singh et al., BMJ Quality & Safety, 2014](https://qualitysafety.bmj.com/content/23/9/727))
|
| 41 |
-
|
| 42 |
-
Doctors are not wrong because they lack knowledge. They are wrong because the human brain takes shortcuts β and in medicine, shortcuts kill. A physician who sees "young patient + chest pain after trauma" anchors on **rib contusion** and stops looking. The pneumothorax on the X-ray goes unseen. The patient deteriorates.
|
| 43 |
|
| 44 |
-
|
| 45 |
|
| 46 |
-
|
| 47 |
|
| 48 |
-
|
| 49 |
|
| 50 |
-
|
| 51 |
|
| 52 |
-
|
| 53 |
-
|:------|:-----|:------|:------------------|
|
| 54 |
-
| **Diagnostician** | Independent image + clinical analysis | [MedGemma 4B-IT](https://huggingface.co/google/medgemma-1.5-4b-it) (multimodal) | **Blinded** β never sees the doctor's diagnosis. Tags each finding as `imaging`, `clinical`, or `both` to distinguish evidence sources. |
|
| 55 |
-
| **Bias Detector** | Compare doctor vs. AI findings | [MedGemma 4B-IT](https://huggingface.co/google/medgemma-1.5-4b-it) + [MedSigLIP](https://huggingface.co/google/medsiglip-448) | Uses **zero-shot image classification** to verify radiological signs. Flags clinical red flags ignored by either assessment. |
|
| 56 |
-
| **Devil's Advocate** | Adversarial challenge | [MedGemma 4B-IT](https://huggingface.co/google/medgemma-1.5-4b-it) | Deliberately contrarian β uses both imaging and clinical evidence to argue for **[must-not-miss diagnoses](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6775443/)** |
|
| 57 |
-
| **Consultant** | Synthesize final report | [MedGemma 4B-IT](https://huggingface.co/google/medgemma-1.5-4b-it) or [27B Text-IT](https://huggingface.co/google/medgemma-27b-text-it) | Writes as a **collegial consultant**: *"Have you considered..."* not *"You are wrong."* Only this agent optionally upgrades to 27B for deeper reasoning. |
|
| 58 |
|
| 59 |
-
##
|
| 60 |
|
| 61 |
-
|
| 62 |
|
| 63 |
**Gradio UI** (image upload, diagnosis input, clinical context, [MedASR](https://huggingface.co/google/medasr) voice input)
|
| 64 |
β **Diagnostician** β receives image + clinical context but **NOT** the doctor's diagnosis; tags findings by source (`imaging` / `clinical` / `both`)
|
| 65 |
-
β **Bias Detector** β
|
| 66 |
β **Devil's Advocate** β challenges the working diagnosis using both imaging and clinical evidence for must-not-miss alternatives
|
| 67 |
β **Consultant** β synthesizes a collegial consultation note
|
| 68 |
β **Output** (consultation report, alternative diagnoses, recommended workup)
|
| 69 |
|
| 70 |
-
###
|
| 71 |
-
|
| 72 |
-
The Bias Detector doesn't just rely on text reasoning β it uses [**MedSigLIP-448**](https://huggingface.co/google/medsiglip-448) for objective visual verification. For each radiological sign mentioned by the Diagnostician (e.g., "pleural effusion", "cardiomegaly", "pneumothorax"), MedSigLIP performs [zero-shot binary classification](https://huggingface.co/tasks/zero-shot-image-classification): it compares the logits of `"chest radiograph showing [sign]"` vs `"normal chest radiograph with no [sign]"`. A logit difference > 2 is classified as "likely present", grounding the bias analysis in **visual evidence** rather than pure language reasoning.
|
| 73 |
-
|
| 74 |
-
## Demo Cases
|
| 75 |
-
|
| 76 |
-
Three composite clinical scenarios covering the most dangerous diagnostic error patterns:
|
| 77 |
-
|
| 78 |
-
<table>
|
| 79 |
-
<tr>
|
| 80 |
-
<td width="33%" valign="top">
|
| 81 |
-
|
| 82 |
-
### Case 1: Missed Pneumothorax
|
| 83 |
-
**π·οΈ TRAUMA**
|
| 84 |
-
|
| 85 |
-
32M, motorcycle collision. Doctor diagnoses **rib contusion**, discharges patient. Supine CXR actually shows a **left pneumothorax** with rib fractures.
|
| 86 |
-
|
| 87 |
-
**Bias**: [Satisfaction of search](https://radiopaedia.org/articles/satisfaction-of-search) β found the rib fractures, stopped looking.
|
| 88 |
-
|
| 89 |
-
</td>
|
| 90 |
-
<td width="33%" valign="top">
|
| 91 |
-
|
| 92 |
-
### Case 2: Aortic Dissection β "GERD"
|
| 93 |
-
**π·οΈ VASCULAR**
|
| 94 |
-
|
| 95 |
-
58M, hypertensive, tearing chest pain. Doctor diagnoses **acid reflux**, prescribes antacids. Blood pressure asymmetry (178/102 R vs 146/88 L) and D-dimer 4,850 suggest **Stanford type B dissection**.
|
| 96 |
-
|
| 97 |
-
**Bias**: [Anchoring](https://en.wikipedia.org/wiki/Anchoring_(cognitive_bias)) + [availability heuristic](https://en.wikipedia.org/wiki/Availability_heuristic) β common diagnosis assumed first.
|
| 98 |
-
|
| 99 |
-
</td>
|
| 100 |
-
<td width="33%" valign="top">
|
| 101 |
-
|
| 102 |
-
### Case 3: Postpartum PE β "Anxiety"
|
| 103 |
-
**π·οΈ POSTPARTUM**
|
| 104 |
-
|
| 105 |
-
29F, day 5 post C-section, dyspnea and tachycardia. Doctor orders **psychiatric consult**. SpO2 91%, ABG shows respiratory alkalosis β classic **pulmonary embolism**.
|
| 106 |
-
|
| 107 |
-
**Bias**: [Premature closure](https://en.wikipedia.org/wiki/Premature_closure) + [framing effect](https://en.wikipedia.org/wiki/Framing_effect_(psychology)) β young woman = anxiety.
|
| 108 |
-
|
| 109 |
-
</td>
|
| 110 |
-
</tr>
|
| 111 |
-
</table>
|
| 112 |
-
|
| 113 |
-
> All cases are educational composites synthesized from published literature. See [`data/demo_cases/SOURCES.md`](data/demo_cases/SOURCES.md) for full citations.
|
| 114 |
-
|
| 115 |
-
## Technical Details
|
| 116 |
-
|
| 117 |
-
### Model Stack
|
| 118 |
-
|
| 119 |
-
| Model | Parameters | Role | Loading |
|
| 120 |
-
|:------|:----------|:-----|:--------|
|
| 121 |
-
| [MedGemma 1.5 4B-IT](https://huggingface.co/google/medgemma-1.5-4b-it) | 4B | Multimodal image+text analysis | 4-bit quantized (~4GB VRAM) or BF16 (~8GB) |
|
| 122 |
-
| [MedGemma 27B Text-IT](https://huggingface.co/google/medgemma-27b-text-it) | 27B | Consultant deep reasoning (optional) | BF16 (~54GB VRAM) |
|
| 123 |
-
| [MedSigLIP-448](https://huggingface.co/google/medsiglip-448) | 0.9B | Zero-shot sign verification | FP32 (~3GB VRAM) |
|
| 124 |
-
| [MedASR](https://huggingface.co/google/medasr) | 105M | Medical speech-to-text | FP32 (~0.5GB VRAM) |
|
| 125 |
|
| 126 |
-
|
|
|
|
|
|
|
|
|
|
| 127 |
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
### Key Technical Decisions
|
| 131 |
-
|
| 132 |
-
- **Blinded Diagnostician**: The first agent never sees the doctor's diagnosis. This prevents the AI from anchoring on the same conclusion, enabling genuine independent analysis.
|
| 133 |
-
|
| 134 |
-
- **Dual-source analysis (imaging + clinical)**: All agents analyze both the medical image and the full clinical context (vitals, labs, risk factors). Each Diagnostician finding is tagged with its source (`imaging`, `clinical`, or `both`). This is critical because many must-not-miss diagnoses β aortic dissection (BP asymmetry), pulmonary embolism (low SpO2, elevated D-dimer) β may have subtle or absent imaging signs but glaring clinical red flags.
|
| 135 |
-
|
| 136 |
-
- **Structured JSON output**: All agents output structured JSON parsed by [`json_repair`](https://github.com/mangiucugna/json_repair), which handles LLM output quirks (missing commas, truncation, markdown wrapping).
|
| 137 |
|
| 138 |
-
|
| 139 |
|
| 140 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
|
| 142 |
-
|
| 143 |
|
| 144 |
-
|
| 145 |
|
| 146 |
## Getting Started
|
| 147 |
|
| 148 |
-
### Prerequisites
|
| 149 |
-
|
| 150 |
-
- Python 3.11+
|
| 151 |
-
- CUDA-capable GPU (12GB+ VRAM)
|
| 152 |
-
- [Hugging Face account](https://huggingface.co) with access to gated models (MedGemma, MedSigLIP, MedASR)
|
| 153 |
-
|
| 154 |
-
### Installation
|
| 155 |
-
|
| 156 |
```bash
|
| 157 |
-
# Clone
|
| 158 |
git clone https://github.com/sypsyp97/diagnostic-devils-advocate
|
| 159 |
cd diagnostic-devils-advocate
|
| 160 |
|
| 161 |
-
# Install
|
| 162 |
pip install -r requirements.txt
|
| 163 |
|
| 164 |
-
# Login to Hugging Face (
|
| 165 |
huggingface-cli login
|
| 166 |
-
```
|
| 167 |
-
|
| 168 |
-
### Running
|
| 169 |
-
|
| 170 |
-
```bash
|
| 171 |
-
# Standard launch (4B quantized, 12GB GPU)
|
| 172 |
-
python app.py
|
| 173 |
-
|
| 174 |
-
# With 27B reasoning model (A100 80GB required)
|
| 175 |
-
USE_27B=true QUANTIZE_4B=false python app.py
|
| 176 |
|
| 177 |
-
#
|
| 178 |
-
|
|
|
|
|
|
|
| 179 |
```
|
| 180 |
|
| 181 |
The app launches at `http://localhost:7860`.
|
| 182 |
|
| 183 |
-
|
|
|
|
| 184 |
|
| 185 |
| Variable | Default | Description |
|
| 186 |
|:---------|:--------|:------------|
|
| 187 |
| `USE_27B` | `false` | Enable 27B model for the Consultant agent |
|
| 188 |
| `QUANTIZE_4B` | `true` | 4-bit quantize the 4B model |
|
| 189 |
| `ENABLE_MEDASR` | `true` | Enable voice input via MedASR |
|
| 190 |
-
| `HF_TOKEN` |
|
| 191 |
| `ENABLE_PROMPT_REPETITION` | `true` | [Prompt repetition](https://arxiv.org/abs/2512.14982) for improved output quality |
|
| 192 |
-
| `MODEL_LOCAL_DIR` |
|
| 193 |
| `DEVICE` | `cuda` | Compute device |
|
| 194 |
|
| 195 |
-
|
|
|
|
|
|
|
|
|
|
| 196 |
|
| 197 |
```
|
| 198 |
diagnostic-devils-advocate/
|
| 199 |
-
βββ app.py
|
| 200 |
-
βββ config.py
|
| 201 |
βββ requirements.txt
|
| 202 |
-
β
|
| 203 |
βββ agents/
|
| 204 |
-
β βββ
|
| 205 |
-
β βββ
|
| 206 |
-
β βββ
|
| 207 |
-
β βββ
|
| 208 |
-
β βββ
|
| 209 |
-
β βββ
|
| 210 |
-
β
|
| 211 |
-
β βββ consultant.py # Agent 4: Consultation note synthesis
|
| 212 |
-
β
|
| 213 |
βββ models/
|
| 214 |
-
β βββ medgemma_client.py
|
| 215 |
-
β βββ medsiglip_client.py
|
| 216 |
-
β βββ medasr_client.py
|
| 217 |
-
β βββ utils.py
|
| 218 |
-
β
|
| 219 |
βββ ui/
|
| 220 |
-
β βββ components.py
|
| 221 |
-
β βββ callbacks.py
|
| 222 |
-
β βββ css.py
|
| 223 |
-
β
|
| 224 |
βββ data/
|
| 225 |
-
βββ demo_cases/
|
| 226 |
-
βββ SOURCES.md # Full literature citations
|
| 227 |
```
|
| 228 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 229 |
## Disclaimer
|
| 230 |
|
| 231 |
-
> **This is a research prototype built for the MedGemma Impact Challenge. It is NOT intended for clinical decision-making.** All demo cases are educational composites. Medical images are sourced from the University of Saskatchewan Teaching Collection (CC-BY-NC-SA 4.0).
|
|
|
|
|
|
|
| 232 |
|
| 233 |
## References
|
| 234 |
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
-
|
| 239 |
-
-
|
| 240 |
-
-
|
| 241 |
-
-
|
| 242 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 243 |
|
| 244 |
---
|
| 245 |
|
|
|
|
| 18 |
|
| 19 |
<div align="center">
|
| 20 |
|
| 21 |
+
# Diagnostic Devil's Advocate
|
| 22 |
|
| 23 |
+
**AI-Powered Cognitive Debiasing for Medical Image Interpretation**
|
| 24 |
|
| 25 |
+
[](https://huggingface.co/google/medgemma-1.5-4b-it)
|
| 26 |
+
[](https://huggingface.co/google/medsiglip-448)
|
| 27 |
+
[](https://langchain-ai.github.io/langgraph/)
|
| 28 |
+
[](https://gradio.app)
|
| 29 |
+
[](LICENSE)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
</div>
|
| 32 |
|
| 33 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
+
## Why This Exists
|
| 36 |
|
| 37 |
+
> Diagnostic errors affect an estimated **12 million** adults annually in the U.S., with cognitive biases implicated in up to **74%** of cases. ([Singh et al., 2014](https://pubmed.ncbi.nlm.nih.gov/24742777/))
|
| 38 |
|
| 39 |
+
Doctors are not wrong because they lack knowledge -- they are wrong because the human brain takes shortcuts. A physician who sees "young patient + chest pain after trauma" anchors on **rib contusion** and stops looking. The pneumothorax goes unseen. The patient deteriorates.
|
| 40 |
|
| 41 |
+
**Diagnostic Devil's Advocate** acts as an adversarial second opinion. It does not replace the physician -- it challenges them: *"Have you considered what happens if you're wrong?"*
|
| 42 |
|
| 43 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
+
## Pipeline
|
| 46 |
|
| 47 |
+
Four agents, each with a distinct adversarial role, orchestrated by [LangGraph](https://langchain-ai.github.io/langgraph/) as a linear `StateGraph`:
|
| 48 |
|
| 49 |
**Gradio UI** (image upload, diagnosis input, clinical context, [MedASR](https://huggingface.co/google/medasr) voice input)
|
| 50 |
β **Diagnostician** β receives image + clinical context but **NOT** the doctor's diagnosis; tags findings by source (`imaging` / `clinical` / `both`)
|
| 51 |
+
β **Bias Detector** β receives the doctor's diagnosis, compares it against independent findings using image, clinical data, and [MedSigLIP](https://huggingface.co/google/medsiglip-448) sign verification
|
| 52 |
β **Devil's Advocate** β challenges the working diagnosis using both imaging and clinical evidence for must-not-miss alternatives
|
| 53 |
β **Consultant** β synthesizes a collegial consultation note
|
| 54 |
β **Output** (consultation report, alternative diagnoses, recommended workup)
|
| 55 |
|
| 56 |
+
### Key Design Choices
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
+
- **Blinded first agent** -- the Diagnostician never sees the doctor's diagnosis, preventing the AI from anchoring on the same conclusion
|
| 59 |
+
- **Dual-source analysis** -- every agent considers both the medical image and clinical context (vitals, labs, risk factors), because many dangerous conditions have subtle imaging but obvious clinical red flags
|
| 60 |
+
- **MedSigLIP verification** -- zero-shot image classification grounds the bias analysis in visual evidence, not just language reasoning
|
| 61 |
+
- **Collegial tone** -- the Consultant writes as a consulting colleague (*"Have you considered..."*), not a critic
|
| 62 |
|
| 63 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
+
## Model Stack
|
| 66 |
|
| 67 |
+
| Model | Params | Role | VRAM |
|
| 68 |
+
|:------|:------:|:-----|:----:|
|
| 69 |
+
| [MedGemma 1.5 4B-IT](https://huggingface.co/google/medgemma-1.5-4b-it) | 4B | Multimodal image + text analysis | ~4 GB (4-bit) |
|
| 70 |
+
| [MedGemma 27B Text-IT](https://huggingface.co/google/medgemma-27b-text-it) | 27B | Consultant deep reasoning (optional) | ~54 GB |
|
| 71 |
+
| [MedSigLIP-448](https://huggingface.co/google/medsiglip-448) | 0.9B | Zero-shot sign verification | ~3 GB |
|
| 72 |
+
| [MedASR](https://huggingface.co/google/medasr) | 105M | Medical speech-to-text | ~0.5 GB |
|
| 73 |
|
| 74 |
+
The full pipeline requires **~8 GB VRAM** and runs on any 12 GB+ CUDA GPU. All models load locally via [Transformers](https://huggingface.co/docs/transformers) with [4-bit quantization](https://huggingface.co/docs/bitsandbytes) -- **zero API costs, fully offline-capable**.
|
| 75 |
|
| 76 |
+
---
|
| 77 |
|
| 78 |
## Getting Started
|
| 79 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
```bash
|
| 81 |
+
# Clone
|
| 82 |
git clone https://github.com/sypsyp97/diagnostic-devils-advocate
|
| 83 |
cd diagnostic-devils-advocate
|
| 84 |
|
| 85 |
+
# Install
|
| 86 |
pip install -r requirements.txt
|
| 87 |
|
| 88 |
+
# Login to Hugging Face (gated models)
|
| 89 |
huggingface-cli login
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
|
| 91 |
+
# Run
|
| 92 |
+
python app.py # 4B quantized (default)
|
| 93 |
+
USE_27B=true QUANTIZE_4B=false python app.py # with 27B Consultant
|
| 94 |
+
ENABLE_MEDASR=false python app.py # without voice input
|
| 95 |
```
|
| 96 |
|
| 97 |
The app launches at `http://localhost:7860`.
|
| 98 |
|
| 99 |
+
<details>
|
| 100 |
+
<summary><b>Environment Variables</b></summary>
|
| 101 |
|
| 102 |
| Variable | Default | Description |
|
| 103 |
|:---------|:--------|:------------|
|
| 104 |
| `USE_27B` | `false` | Enable 27B model for the Consultant agent |
|
| 105 |
| `QUANTIZE_4B` | `true` | 4-bit quantize the 4B model |
|
| 106 |
| `ENABLE_MEDASR` | `true` | Enable voice input via MedASR |
|
| 107 |
+
| `HF_TOKEN` | -- | Hugging Face token (or use `huggingface-cli login`) |
|
| 108 |
| `ENABLE_PROMPT_REPETITION` | `true` | [Prompt repetition](https://arxiv.org/abs/2512.14982) for improved output quality |
|
| 109 |
+
| `MODEL_LOCAL_DIR` | -- | Local directory for pre-downloaded models |
|
| 110 |
| `DEVICE` | `cuda` | Compute device |
|
| 111 |
|
| 112 |
+
</details>
|
| 113 |
+
|
| 114 |
+
<details>
|
| 115 |
+
<summary><b>Project Structure</b></summary>
|
| 116 |
|
| 117 |
```
|
| 118 |
diagnostic-devils-advocate/
|
| 119 |
+
βββ app.py # Gradio entry point
|
| 120 |
+
βββ config.py # Model & environment config
|
| 121 |
βββ requirements.txt
|
|
|
|
| 122 |
βββ agents/
|
| 123 |
+
β βββ prompts.py # All agent prompt templates
|
| 124 |
+
β βββ graph.py # LangGraph StateGraph pipeline
|
| 125 |
+
β βββ output_parser.py # JSON parsing (json_repair + llm-output-parser)
|
| 126 |
+
β βββ diagnostician.py # Agent 1: Blinded analysis
|
| 127 |
+
β βββ bias_detector.py # Agent 2: Bias detection + MedSigLIP
|
| 128 |
+
β βββ devil_advocate.py # Agent 3: Adversarial challenge
|
| 129 |
+
β βββ consultant.py # Agent 4: Consultation synthesis
|
|
|
|
|
|
|
| 130 |
βββ models/
|
| 131 |
+
β βββ medgemma_client.py # MedGemma 4B/27B inference
|
| 132 |
+
β βββ medsiglip_client.py # MedSigLIP zero-shot classification
|
| 133 |
+
β βββ medasr_client.py # MedASR speech-to-text
|
| 134 |
+
β βββ utils.py # Image preprocessing, token stripping
|
|
|
|
| 135 |
βββ ui/
|
| 136 |
+
β βββ components.py # Gradio layout
|
| 137 |
+
β βββ callbacks.py # UI event handlers
|
| 138 |
+
β βββ css.py # Custom styling
|
|
|
|
| 139 |
βββ data/
|
| 140 |
+
βββ demo_cases/ # Composite clinical scenarios
|
|
|
|
| 141 |
```
|
| 142 |
|
| 143 |
+
</details>
|
| 144 |
+
|
| 145 |
+
---
|
| 146 |
+
|
| 147 |
## Disclaimer
|
| 148 |
|
| 149 |
+
> **This is a research prototype built for the [MedGemma Impact Challenge](https://www.kaggle.com/competitions/medgemma-impact-challenge). It is NOT intended for clinical decision-making.** All demo cases are educational composites. Medical images are sourced from the University of Saskatchewan Teaching Collection (CC-BY-NC-SA 4.0).
|
| 150 |
+
|
| 151 |
+
---
|
| 152 |
|
| 153 |
## References
|
| 154 |
|
| 155 |
+
<details>
|
| 156 |
+
<summary><b>Diagnostic Error & Cognitive Bias</b></summary>
|
| 157 |
+
|
| 158 |
+
- Singh H, Meyer AND, Thomas EJ. "The frequency of diagnostic errors in outpatient care: estimations from three large observational studies involving US adult populations." *BMJ Quality & Safety*, 2014;23(9):727--731. [doi:10.1136/bmjqs-2013-002627](https://pubmed.ncbi.nlm.nih.gov/24742777/)
|
| 159 |
+
- Croskerry P. "The importance of cognitive errors in diagnosis and strategies to minimize them." *Academic Medicine*, 2003;78(8):775--780. [doi:10.1097/00001888-200308000-00003](https://pubmed.ncbi.nlm.nih.gov/12915363/)
|
| 160 |
+
- Vally ZI, Khammissa RAG, Feller G, et al. "Errors in clinical diagnosis: a narrative review." *Journal of International Medical Research*, 2023;51(8):03000605231162798. [doi:10.1177/03000605231162798](https://pubmed.ncbi.nlm.nih.gov/37602466/)
|
| 161 |
+
- Staal J, Hooftman J, Gunput STG, et al. "Effect on diagnostic accuracy of cognitive reasoning tools for the workplace setting: systematic review and meta-analysis." *BMJ Quality & Safety*, 2022;31(12):899--910. [doi:10.1136/bmjqs-2022-014865](https://pubmed.ncbi.nlm.nih.gov/36396150/)
|
| 162 |
+
|
| 163 |
+
</details>
|
| 164 |
+
|
| 165 |
+
<details>
|
| 166 |
+
<summary><b>AI-Assisted Debiasing & Multi-Agent Systems</b></summary>
|
| 167 |
+
|
| 168 |
+
- Brown C, Nazeer R, Gibbs A, et al. "Breaking Bias: The Role of Artificial Intelligence in Improving Clinical Decision-Making." *Cureus*, 2023;15(3):e36415. [doi:10.7759/cureus.36415](https://pubmed.ncbi.nlm.nih.gov/37090406/)
|
| 169 |
+
- Tang X, Zou A, Zhang Z, et al. "MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning." *Findings of ACL*, 2024:599--621. [arXiv:2311.10537](https://arxiv.org/abs/2311.10537)
|
| 170 |
+
- Kim Y, Park C, Jeong H, et al. "MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making." *NeurIPS*, 2024. [arXiv:2404.15155](https://arxiv.org/abs/2404.15155)
|
| 171 |
+
- Chen X, Yi H, You M, et al. "Enhancing diagnostic capability with multi-agents conversational large language models." *npj Digital Medicine*, 2025;8:159. [doi:10.1038/s41746-025-01550-0](https://pubmed.ncbi.nlm.nih.gov/40082662/)
|
| 172 |
+
|
| 173 |
+
</details>
|
| 174 |
+
|
| 175 |
+
<details>
|
| 176 |
+
<summary><b>Medical Vision-Language Models & Prompt Engineering</b></summary>
|
| 177 |
+
|
| 178 |
+
- Jang J, Kyung D, Kim SH, et al. "Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders." *Scientific Reports*, 2024;14:23199. [doi:10.1038/s41598-024-73695-z](https://pubmed.ncbi.nlm.nih.gov/39369048/)
|
| 179 |
+
- Leviathan Y, Kalman M, Matias Y. "Prompt Repetition Improves Non-Reasoning LLMs." [arXiv:2512.14982](https://arxiv.org/abs/2512.14982), Google Research, 2025.
|
| 180 |
+
- Zaghir J, Naguib M, Bjelogrlic M, et al. "Prompt Engineering Paradigms for Medical Applications: Scoping Review." *Journal of Medical Internet Research*, 2024;26:e60501. [doi:10.2196/60501](https://pubmed.ncbi.nlm.nih.gov/39255030/)
|
| 181 |
+
- Sellergren A, Kazemzadeh S, Jaroensri T, et al. "MedGemma Technical Report." [arXiv:2507.05201](https://arxiv.org/abs/2507.05201), Google, 2025.
|
| 182 |
+
|
| 183 |
+
</details>
|
| 184 |
|
| 185 |
---
|
| 186 |
|