cds-agent / README.md
bshepp
Add HF Space YAML frontmatter to root README
9986e2a
metadata
title: CDS Agent
emoji: πŸ₯
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
fullWidth: true
custom_domains:
  - demo.briansheppard.com

CDS Agent β€” Clinical Decision Support System

An agentic clinical decision support application that orchestrates medical AI with specialized tools to assist clinicians in real time.

Origin: MedGemma Impact Challenge (Kaggle / Google Research)
Focus: Building a genuinely impactful medical application β€” not just a competition entry.


What It Does

A clinician pastes a patient case. The system automatically:

  1. Parses the free-text into structured patient data (demographics, vitals, labs, medications, history)
  2. Reasons about the case to generate a ranked differential diagnosis with chain-of-thought transparency
  3. Checks drug interactions against OpenFDA and RxNorm databases
  4. Retrieves clinical guidelines from a 62-guideline RAG corpus spanning 14 medical specialties
  5. Detects conflicts between guideline recommendations and the patient's actual data β€” surfacing omissions, contradictions, dosage concerns, and monitoring gaps
  6. Synthesizes everything into a structured CDS report with recommendations, warnings, conflicts, and citations

All six steps stream to the frontend in real time via WebSocket β€” the clinician sees each step execute live.


System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    FRONTEND (Next.js 14 + React)                    β”‚
β”‚  Patient Case Input  β”‚  Agent Activity Feed  β”‚  CDS Report View    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚ REST API + WebSocket
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     BACKEND (FastAPI + Python 3.10)                  β”‚
β”‚                                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                ORCHESTRATOR (6-Step Pipeline)                  β”‚  β”‚
β”‚  β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”Œβ”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β–Όβ”€β”€β”€β”€β”  β”‚
β”‚  β”‚Parse β”‚ β”‚Reason  β”‚ β”‚ Drug β”‚ β”‚  RAG   β”‚ β”‚Conflict β”‚ β”‚Synth- β”‚  β”‚
β”‚  β”‚Pati- β”‚ β”‚(LLM)   β”‚ β”‚Check β”‚ β”‚Guide-  β”‚ β”‚Detect-  β”‚ β”‚esize  β”‚  β”‚
β”‚  β”‚ent   β”‚ β”‚Differ- β”‚ β”‚OpenFDAβ”‚ β”‚lines   β”‚ β”‚ion      β”‚ β”‚(LLM)  β”‚  β”‚
β”‚  β”‚Data  β”‚ β”‚ential  β”‚ β”‚RxNorm β”‚ β”‚ChromaDBβ”‚ β”‚(LLM)    β”‚ β”‚Report β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                      β”‚
β”‚  External: OpenFDA API β”‚ RxNorm/NLM API β”‚ ChromaDB (local)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

See docs/architecture.md for the full design document.


Verified Test Results

Full Pipeline E2E Test (Chest Pain / ACS Case)

All 6 pipeline steps completed successfully:

Step Duration Result
Parse Patient Data 7.8 s Structured profile extracted
Clinical Reasoning 21.2 s ACS correctly identified as top differential
Drug Interaction Check 11.3 s Interactions queried against OpenFDA / RxNorm
Guideline Retrieval (RAG) 9.6 s Relevant cardiology guidelines retrieved
Conflict Detection ~5 s Guideline vs patient data comparison for omissions, contradictions, monitoring gaps
Synthesis 25.3 s Comprehensive CDS report generated

RAG Retrieval Quality Test

30 / 30 queries passed (100%) across all 14 specialties:

Metric Value
Queries tested 30
Pass rate 100% (30/30)
Avg relevance score 0.639
Min relevance score 0.519
Max relevance score 0.765
Top-1 accuracy 100% (correct guideline ranked #1 for every query)

Full results: docs/test_results.md

Clinical Test Suite

22 comprehensive clinical scenarios covering: ACS, AFib, heart failure, stroke, sepsis, anaphylaxis, polytrauma, DKA, thyroid storm, adrenal crisis, massive PE, status asthmaticus, GI bleeding, pancreatitis, status epilepticus, meningitis, suicidal ideation, neonatal fever, pediatric dehydration, hyperkalemia, acetaminophen overdose, and elderly polypharmacy with falls.

External Dataset Validation

A validation framework tests the pipeline against real-world clinical datasets:

Dataset Source Cases Available What It Tests
MedQA (USMLE) HuggingFace 1,273 Diagnostic accuracy β€” does the top differential match the correct answer?
MTSamples GitHub ~5,000 Parse quality & field completeness on real transcription notes
PMC Case Reports PubMed E-utilities Dynamic Diagnostic accuracy on published case reports with known diagnoses

Initial smoke test (3 MedQA cases): 100% parse success, 66.7% top-1 diagnostic accuracy.

50-case MedQA validation (MedGemma 27B via HF Endpoint):

Metric Value
Cases run 50
Pipeline success 94% (47/50)
Top-1 diagnostic accuracy 36%
Top-3 diagnostic accuracy 38%
Differential accuracy 10%
Mentioned in report 38%
Avg pipeline time 204 s/case

Of the 50 cases, 36 were diagnostic questions β€” on those, 39% mentioned the correct diagnosis and 14% placed it in the differential.

See docs/test_results.md for full details and reproduction steps.


RAG Clinical Guidelines Corpus

62 clinical guidelines across 14 medical specialties, stored in ChromaDB with sentence-transformer embeddings (all-MiniLM-L6-v2):

Specialty Count Key Topics
Cardiology 8 HTN, chest pain / ACS, HF, AFib, lipids, NSTEMI, PE, valvular disease
Emergency Medicine 10 Stroke, sepsis, trauma, anaphylaxis, burns, ACLS, seizures, toxicology, hyperkalemia, acute abdomen
Endocrinology 7 DM management, DKA, thyroid, adrenal insufficiency, osteoporosis, hypoglycemia, hypercalcemia
Pulmonology 4 COPD, asthma, CAP, pleural effusion
Neurology 4 Epilepsy, migraine, MS, meningitis
Gastroenterology 5 Upper GI bleed, pancreatitis, cirrhosis, IBD, CRC screening
Infectious Disease 5 STIs, UTI, HIV, SSTIs, COVID-19
Psychiatry 4 MDD, suicide risk, GAD, substance use
Pediatrics 4 Fever without source, asthma, dehydration, neonatal jaundice
Nephrology 2 CKD, AKI
Hematology 2 VTE, sickle cell
Rheumatology 2 RA, gout
OB/GYN 2 Hypertensive disorders of pregnancy, postpartum hemorrhage
Other 3+ Preventive medicine (USPSTF), perioperative cardiac risk, dermatology (melanoma)

Sources include ACC/AHA, ADA, GOLD, GINA, IDSA, ACOG, AAN, APA, AAP, ACR, ASH, KDIGO, WHO, and other major guideline organizations.


Project Structure

medgemma_impact_challenge/
β”œβ”€β”€ README.md                           # This file
β”œβ”€β”€ DEVELOPMENT_LOG.md                  # Chronological build history & decisions
β”œβ”€β”€ SUBMISSION_GUIDE.md                 # Competition submission strategy
β”œβ”€β”€ RULES_SUMMARY.md                    # Competition rules checklist
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ architecture.md                 # System architecture & design decisions
β”‚   β”œβ”€β”€ test_results.md                 # Detailed test results & benchmarks
β”‚   β”œβ”€β”€ writeup_draft.md               # Project writeup / summary
β”‚   └── deploy_medgemma_hf.md          # MedGemma HF Endpoint deployment guide
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ backend/                        # Python FastAPI backend
β”‚   β”‚   β”œβ”€β”€ .env.template              # Environment config template
β”‚   β”‚   β”œβ”€β”€ .env                       # Local config (not committed)
β”‚   β”‚   β”œβ”€β”€ requirements.txt           # Python dependencies (28 packages)
β”‚   β”‚   β”œβ”€β”€ test_e2e.py               # End-to-end pipeline test
β”‚   β”‚   β”œβ”€β”€ test_clinical_cases.py    # 22 clinical scenario test suite
β”‚   β”‚   β”œβ”€β”€ test_rag_quality.py       # RAG retrieval quality tests (30 queries)
β”‚   β”‚   β”œβ”€β”€ test_poll.py              # Simple case poller utility
β”‚   β”‚   β”œβ”€β”€ validation/               # External dataset validation framework
β”‚   β”‚   β”‚   β”œβ”€β”€ base.py               # Core framework (runners, scorers, utilities)
β”‚   β”‚   β”‚   β”œβ”€β”€ harness_medqa.py      # MedQA (USMLE) diagnostic accuracy harness
β”‚   β”‚   β”‚   β”œβ”€β”€ harness_mtsamples.py  # MTSamples parse quality harness
β”‚   β”‚   β”‚   β”œβ”€β”€ harness_pmc.py        # PMC Case Reports diagnostic harness
β”‚   β”‚   β”‚   β”œβ”€β”€ run_validation.py     # Unified CLI runner
β”‚   β”‚   β”‚   β”œβ”€β”€ analyze_results.py    # Question-type categorization & analysis
β”‚   β”‚   β”‚   └── check_progress.py     # Checkpoint progress monitor
β”‚   β”‚   └── app/
β”‚   β”‚       β”œβ”€β”€ main.py               # FastAPI entry (CORS, routers, lifespan)
β”‚   β”‚       β”œβ”€β”€ config.py             # Pydantic Settings (ports, models, dirs)
β”‚   β”‚       β”œβ”€β”€ __init__.py
β”‚   β”‚       β”œβ”€β”€ models/
β”‚   β”‚       β”‚   └── schemas.py        # All Pydantic models (~280 lines)
β”‚   β”‚       β”œβ”€β”€ agent/
β”‚   β”‚       β”‚   └── orchestrator.py   # 6-step pipeline orchestrator (~300 lines)
β”‚   β”‚       β”œβ”€β”€ services/
β”‚   β”‚       β”‚   └── medgemma.py       # LLM service (OpenAI-compatible API)
β”‚   β”‚       β”œβ”€β”€ tools/
β”‚   β”‚       β”‚   β”œβ”€β”€ patient_parser.py      # Step 1: Free-text β†’ structured data
β”‚   β”‚       β”‚   β”œβ”€β”€ clinical_reasoning.py  # Step 2: Differential diagnosis
β”‚   β”‚       β”‚   β”œβ”€β”€ drug_interactions.py   # Step 3: OpenFDA + RxNorm
β”‚   β”‚       β”‚   β”œβ”€β”€ guideline_retrieval.py # Step 4: RAG over ChromaDB
β”‚   β”‚       β”‚   β”œβ”€β”€ conflict_detection.py  # Step 5: Guideline vs patient conflicts
β”‚   β”‚       β”‚   └── synthesis.py           # Step 6: CDS report generation
β”‚   β”‚       β”œβ”€β”€ data/
β”‚   β”‚       β”‚   └── clinical_guidelines.json  # 62 guidelines, 14 specialties
β”‚   β”‚       └── api/
β”‚   β”‚           β”œβ”€β”€ health.py         # GET /api/health
β”‚   β”‚           β”œβ”€β”€ cases.py          # POST /api/cases/submit, GET /api/cases/{id}
β”‚   β”‚           └── ws.py            # WebSocket /ws/agent
β”‚   └── frontend/                     # Next.js 14 + React 18 + TypeScript
β”‚       β”œβ”€β”€ package.json
β”‚       β”œβ”€β”€ next.config.js            # API proxy β†’ backend
β”‚       β”œβ”€β”€ tailwind.config.js
β”‚       └── src/
β”‚           β”œβ”€β”€ app/
β”‚           β”‚   β”œβ”€β”€ layout.tsx
β”‚           β”‚   β”œβ”€β”€ page.tsx          # Main CDS interface
β”‚           β”‚   └── globals.css
β”‚           β”œβ”€β”€ components/
β”‚           β”‚   β”œβ”€β”€ PatientInput.tsx   # Patient case input + 3 sample cases
β”‚           β”‚   β”œβ”€β”€ AgentPipeline.tsx  # Real-time step visualization
β”‚           β”‚   └── CDSReport.tsx     # Final report renderer
β”‚           └── hooks/
β”‚               └── useAgentWebSocket.ts  # WebSocket state management
β”œβ”€β”€ notebooks/                        # Experiment notebooks
β”œβ”€β”€ models/                           # Fine-tuned models (future)
└── demo/                             # Video & demo assets

Quick Start

Prerequisites

  • Python 3.10+ (tested with Python 3.10)
  • Node.js 18+ (tested with Node.js 18)
  • API Key: HuggingFace API token (for MedGemma endpoint) or Google AI Studio API key

Backend Setup

cd src/backend

# Create and activate virtual environment
python -m venv venv
venv\Scripts\activate          # Windows
# source venv/bin/activate     # macOS/Linux

# Install dependencies
pip install -r requirements.txt

# Configure environment
copy .env.template .env        # Windows (or: cp .env.template .env)
# Edit .env β€” set MEDGEMMA_API_KEY and MEDGEMMA_BASE_URL
# For HF Endpoints: see docs/deploy_medgemma_hf.md
# For Google AI Studio: set MEDGEMMA_API_KEY to your Google AI Studio key

# Start the backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Frontend Setup

cd src/frontend

npm install
npm run dev
# Open http://localhost:3000

Note: The frontend proxies API requests to the backend. If using a non-default port, update next.config.js and src/hooks/useAgentWebSocket.ts accordingly.

Running Tests

cd src/backend

# RAG retrieval quality test (no backend needed)
python test_rag_quality.py --rebuild --verbose

# Full pipeline E2E test (requires running backend)
python test_e2e.py

# Comprehensive clinical test suite (requires running backend)
python test_clinical_cases.py --list              # See all 22 cases
python test_clinical_cases.py --case em_sepsis    # Run one case
python test_clinical_cases.py --specialty Cardio   # Run by specialty
python test_clinical_cases.py                      # Run all cases
python test_clinical_cases.py --report results.json  # Save results

# External dataset validation (no backend needed β€” calls orchestrator directly)
python -m validation.run_validation --fetch-only          # Download datasets only
python -m validation.run_validation --medqa --max-cases 5  # 5 MedQA cases
python -m validation.run_validation --mtsamples --max-cases 5
python -m validation.run_validation --pmc --max-cases 5
python -m validation.run_validation --all --max-cases 10   # All 3 datasets

Usage

  1. Open http://localhost:3000
  2. Paste a patient case description (or click a sample case)
  3. Click "Analyze Patient Case"
  4. Watch the 6-step agent pipeline execute in real time
  5. Review the CDS report: differential diagnosis, drug warnings, conflicts & gaps, guideline recommendations, next steps

Tech Stack

Layer Technology Purpose
Frontend Next.js 14, React 18, TypeScript, Tailwind CSS Patient input, pipeline visualization, report display
API FastAPI, WebSocket, Pydantic v2 REST endpoints + real-time streaming
LLM MedGemma 27B Text IT (via HuggingFace Dedicated Endpoint) Clinical reasoning + synthesis
RAG ChromaDB, sentence-transformers (all-MiniLM-L6-v2) Clinical guideline retrieval
Drug Data OpenFDA API, RxNorm / NLM API Drug interactions, medication normalization
Validation Pydantic Structured output validation across all pipeline steps
External Validation MedQA, MTSamples, PMC Case Reports Diagnostic accuracy & parse quality benchmarking

API Reference

Endpoint Method Description
/api/health GET Health check
/api/cases/submit POST Submit a patient case for analysis
/api/cases/{case_id} GET Get case results (poll for completion)
/api/cases GET List all cases
/ws/agent WebSocket Real-time pipeline step streaming

Submit a Case (REST)

curl -X POST http://localhost:8000/api/cases/submit \
  -H "Content-Type: application/json" \
  -d '{
    "patient_text": "62yo male with crushing chest pain radiating to left arm...",
    "include_drug_check": true,
    "include_guidelines": true
  }'

Documentation Index

Document Description
README.md This file β€” overview, setup, results
docs/architecture.md System architecture, pipeline design, design decisions
docs/test_results.md Detailed test results, RAG benchmarks, pipeline timing
DEVELOPMENT_LOG.md Chronological build history, problems solved, decisions made
docs/writeup_draft.md Project writeup / summary
CONTRIBUTING.md How to contribute to the project
SECURITY.md Security policy and responsible disclosure
TODO.md Next-session action items and project state
SUBMISSION_GUIDE.md Competition submission strategy
docs/deploy_medgemma_hf.md MedGemma HuggingFace Endpoint deployment guide

License

Licensed under the Apache License 2.0.

This project uses MedGemma and other models from Google's Health AI Developer Foundations (HAI-DEF), subject to the HAI-DEF Terms of Use.

Disclaimer: This is a research / demonstration system. It is NOT a substitute for professional medical judgment. All clinical decisions must be made by qualified healthcare professionals.