File size: 18,455 Bytes
f2c113d c28dd56 f2c113d 5d53fbf f2c113d c28dd56 f2c113d c28dd56 f2c113d c28dd56 f2c113d c28dd56 f2c113d 5d53fbf f2c113d 5d53fbf f2c113d 5d53fbf f2c113d 5d53fbf f2c113d 5d53fbf f2c113d c28dd56 f2c113d c28dd56 f2c113d c28dd56 f2c113d 5d53fbf f2c113d c28dd56 f2c113d 9dea0ad f2c113d 9dea0ad f2c113d 5d53fbf f2c113d 5d53fbf f2c113d 9dea0ad |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 |
# Clinical Decision Support Agent β Architecture
## The Problem
**Current workflow (painful, error-prone):**
A clinician sees a patient β manually reviews the chart, labs, medications β searches
UpToDate or reference materials β checks drug interactions β mentally synthesizes all
information β makes clinical decisions. This is slow, cognitively taxing, and mistakes
happen when clinicians are fatigued or overloaded.
**Agent-reimagined workflow:**
Patient data goes in β an orchestrated agent pipeline automatically gathers context,
reasons about the case, checks interactions, retrieves guidelines, and produces a
structured clinical decision support report β all in seconds.
---
## System Architecture
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND (Next.js 14 + React) β
β PatientInput.tsx β AgentPipeline.tsx β CDSReport.tsx β
β 3 sample cases β Real-time step viz β Full report render β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β REST API (port 3000 β proxy)
β WebSocket (direct to backend)
ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β BACKEND (FastAPI + Python 3.10) β
β Port 8000 (default) / 8002 (dev) β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ORCHESTRATOR (orchestrator.py, ~300 lines) β β
β β Sequential 6-step pipeline with structured state passing β β
β ββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββ β
β β β β β β β β
β ββββΌββββ βββββΌβββββ ββββΌββββ βββββΌβββββ βββββΌββββββ ββββΌβββββββ β
β βStep 1β βStep 2 β βStep 3β βStep 4 β βStep 5 β βStep 6 β β
β βPati- β βClini- β βDrug β βGuide- β βConflict β βSynthe- β β
β βent β βcal β βInter-β βline β βDetect- β βsis β β
β βParserβ βReason- β βactionβ βRetriev-β βion β βAgent β β
β β(LLM) β βing β β(APIs)β βal(RAG) β β(LLM) β β(LLM) β β
β ββββββββ β(LLM) β ββββ¬βββ ββββ¬ββββββ βββββββββββ βββββββββββ β
β ββββββββββ β β β
β ββββββΌβββββ βββΌβββββββββββββββ β
β βOpenFDA β βChromaDB β β
β βRxNorm β β62 guidelines β β
β βNLM API β β14 specialties β β
β βββββββββββ βMiniLM-L6-v2 β β
β βββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
LLM: google/medgemma-27b-text-it via HuggingFace Dedicated Endpoint
(OpenAI-compatible TGI, 1Γ A100 80 GB, bfloat16)
```
---
## Agent Pipeline β Step-by-Step
### Step 1: Patient Data Parser (`patient_parser.py`)
- **Input:** Raw patient case free-text
- **Output:** `PatientProfile` (Pydantic model)
- **Method:** LLM extraction with structured JSON output
- **Fields extracted:** Demographics, chief complaint, HPI, vital signs, labs, medications, allergies, past medical history, social history
- **Timing:** ~7.8 s (observed)
### Step 2: Clinical Reasoning Agent (`clinical_reasoning.py`)
- **Input:** `PatientProfile` from Step 1
- **Output:** `ClinicalReasoningResult` with ranked differential diagnosis
- **Method:** Chain-of-thought prompting for transparent reasoning
- **Key outputs:** Ranked `DiagnosisCandidate` list (name, likelihood, key findings for/against), risk assessment, recommended workup
- **Timing:** ~21.2 s (observed)
### Step 3: Drug Interaction Check (`drug_interactions.py`)
- **Input:** Medication list from Step 1 + any proposed medications from Step 2
- **Output:** `DrugInteractionResult` with interaction warnings
- **Method:** Two-API approach:
1. **RxNorm / NLM API** β Normalize medication names to RxCUI identifiers, check pairwise interactions
2. **OpenFDA API** β Query drug adverse event reports for additional safety data
- **Bug fix applied:** RxNorm API returns `rxnormId` as a list, not a scalar β code handles both formats
- **Timing:** ~11.3 s (observed)
### Step 4: Guideline Retrieval β RAG (`guideline_retrieval.py`)
- **Input:** Primary diagnosis/conditions from Step 2
- **Output:** `GuidelineRetrievalResult` with relevant guideline excerpts and citations
- **Method:** Retrieval-Augmented Generation over a curated guideline corpus
- **RAG details:**
- **Vector store:** ChromaDB `PersistentClient` (persist dir: `./data/chroma`)
- **Embedding model:** `sentence-transformers/all-MiniLM-L6-v2` (384-dim)
- **Corpus:** 62 clinical guidelines from `clinical_guidelines.json`
- **Specialties:** 14 (Cardiology, EM, Endocrinology, Pulmonology, Neurology, GI, ID, Psychiatry, Pediatrics, Nephrology, Hematology, Rheumatology, OB/GYN, Preventive/Other)
- **Metadata:** `specialty`, `guideline_id` stored per document in ChromaDB
- **Similarity:** Cosine similarity, top-k retrieval (k=5 default)
- **Sources:** ACC/AHA, ADA, GOLD, GINA, IDSA, ACOG, AAN, APA, AAP, ACR, ASH, KDIGO, WHO, USPSTF, and others
- **Fallback:** If `clinical_guidelines.json` is missing, falls back to 2 minimal embedded guidelines
- **Timing:** ~9.6 s (observed)
### Step 5: Conflict Detection (`conflict_detection.py`)
- **Input:** Patient profile, clinical reasoning, drug interactions, and retrieved guidelines from Steps 1β4
- **Output:** `ConflictDetectionResult` with specific `ClinicalConflict` items
- **Method:** LLM-based comparison of guideline recommendations against the patient's actual data
- **Conflict types detected:**
- **Omission** β Guideline recommends something the patient is not receiving
- **Contradiction** β Patient's current treatment conflicts with guideline advice
- **Dosage** β Guideline specifies dose adjustments that apply to this patient (age, renal function, etc.)
- **Monitoring** β Guideline requires monitoring that is not documented as ordered
- **Allergy Risk** β Guideline-recommended treatment involves a medication the patient is allergic to
- **Interaction Gap** β Known drug interaction is not addressed in the care plan
- **Each conflict includes:** severity (critical/high/moderate/low), guideline source, guideline text, patient data, description, and suggested resolution
- **Temperature:** 0.1 (low, for safety-critical analysis)
- **Graceful degradation:** Returns empty result if no guidelines were retrieved (Step 4 skipped/failed)
### Step 6: Synthesis Agent (`synthesis.py`)
- **Input:** All outputs from Steps 1β4 plus conflict detection results
- **Output:** `CDSReport` (comprehensive structured report)
- **Report sections:**
- Patient summary
- Differential diagnosis with reasoning chains
- Drug interaction warnings with severity
- **Conflicts & gaps** β prominently featured with guideline vs patient data comparison
- Guideline-concordant recommendations with citations
- Suggested next steps (immediate, short-term, long-term)
- Caveats and limitations
- **Timing:** ~25.3 s (observed)
**Total pipeline time:** ~75β85 s for a complex case (6 steps, with Steps 3β4 parallel).
---
## LLM Integration β Implementation Details
### Model Configuration
- **Model:** `google/medgemma-27b-text-it` (MedGemma from HAI-DEF)
- **API:** HuggingFace Dedicated Endpoint (TGI), with Google AI Studio as fallback
- **Base URL:** `https://lisvpf8if1yhgxn2.us-east-1.aws.endpoints.huggingface.cloud/v1` (HF Endpoint)
- **Client:** OpenAI Python SDK (`openai==1.51.0`)
- **Service:** `medgemma.py` wraps all LLM calls
- **Endpoint config:** `MAX_INPUT_TOKENS=12288`, `MAX_TOTAL_TOKENS=16384`, `DTYPE=bfloat16`
### Gemma System Prompt Handling
**MedGemma via TGI** natively supports `role: "system"` messages, so we send system/user messages properly.
**Fallback for Google AI Studio:** If the backend happens to be plain Gemma on Google AI Studio (which rejects the system role), the code automatically catches the error and falls back to folding the system prompt into the first user message:
```python
# If system message exists, fold it into the first user message
if messages[0]["role"] == "system":
system_content = messages[0]["content"]
messages = messages[1:]
if messages and messages[0]["role"] == "user":
messages[0]["content"] = f"[System Instructions]\n{system_content}\n\n{messages[0]['content']}"
```
This preserves the intended behavior while staying compatible with Gemma's API constraints.
---
## Data Models (Pydantic v2)
All pipeline data is strongly typed via Pydantic models in `schemas.py` (~280 lines):
| Model | Purpose |
|-------|---------|
| `CaseSubmission` | Input: patient text + feature flags |
| `PatientProfile` | Step 1 output: demographics, vitals, labs, meds, history |
| `DiagnosisCandidate` | Individual diagnosis with likelihood + evidence |
| `ClinicalReasoningResult` | Step 2 output: ranked differentials + workup |
| `DrugInteraction` | Individual drug interaction warning |
| `DrugInteractionResult` | Step 3 output: all interaction data |
| `GuidelineExcerpt` | Individual guideline citation |
| `GuidelineRetrievalResult` | Step 4 output: relevant guidelines |
| `ConflictType` | Enum: omission, contradiction, dosage, monitoring, allergy_risk, interaction_gap |
| `ClinicalConflict` | Individual conflict: guideline_text vs patient_data + suggested resolution |
| `ConflictDetectionResult` | Step 5 output: all detected conflicts |
| `CDSReport` | Step 6 output: full synthesized report (now includes conflicts) |
| `AgentStep` | WebSocket message: step name, status, data, timing |
---
## Frontend Architecture
### Technology
- **Framework:** Next.js 14 (App Router)
- **UI:** React 18 + TypeScript + Tailwind CSS
- **State:** React hooks + WebSocket for real-time updates
### Components
| Component | Role |
|-----------|------|
| `PatientInput.tsx` | Text area for patient case + 3 pre-loaded sample cases (chest pain, DKA, pediatric fever) |
| `AgentPipeline.tsx` | Visualizes the 6-step pipeline in real time β shows status (pending / running / complete / error) for each step as WebSocket messages arrive |
| `CDSReport.tsx` | Renders the final CDS report: patient summary, differentials, drug warnings, **conflicts & gaps** (prominently styled), guidelines, next steps |
### Communication
- **REST API:** `POST /api/cases/submit` (submit case), `GET /api/cases/{id}` (poll results)
- **WebSocket:** `ws://localhost:8000/ws/agent` β receives `AgentStep` messages as each pipeline step starts/completes
- **Proxy:** `next.config.js` proxies `/api/` requests to the backend
---
## API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/health` | GET | Health check β returns backend status |
| `/api/cases/submit` | POST | Submit a `CaseSubmission` for analysis |
| `/api/cases/{case_id}` | GET | Poll for case results |
| `/api/cases` | GET | List all submitted cases |
| `/ws/agent` | WebSocket | Real-time pipeline step streaming |
---
## External API Dependencies
| API | Purpose | Authentication | Rate Limits |
|-----|---------|---------------|-------------|
| HuggingFace Dedicated Endpoint | MedGemma 27B Text IT LLM inference | HF API token | Dedicated GPU (no shared limits) |
| Google AI Studio (fallback) | Gemma 3 27B IT LLM inference | API key | Per-key quota |
| OpenFDA | Drug adverse event data | None (public) | 240 req/min (with key), 40/min (without) |
| RxNorm / NLM | Drug normalization (name β RxCUI), pairwise interactions | None (public) | 20 req/sec |
---
## Why This Is Agentic (Not Just a Chatbot)
| Characteristic | Chatbot | This Agent System |
|----------------|---------|-------------------|
| Tool use | None | 5+ specialized tools (parser, drug API, RAG, conflict detection, synthesis) |
| Planning | None | Orchestrator executes a defined 6-step plan |
| State management | Stateless | Patient context flows through all steps |
| Error handling | Generic | Tool-specific fallbacks, graceful degradation |
| Output structure | Free text | Pydantic-validated, structured, cited |
| Transparency | Black box | Shows each reasoning step + tool outputs in real time |
| External data | None | Queries 3 external data sources (OpenFDA, RxNorm, ChromaDB) |
---
## Key Design Decisions
1. **Custom orchestrator over LangChain/LlamaIndex** β Simpler, more transparent, easier to debug. We control the pipeline loop explicitly. No framework overhead for a sequential 6-step pipeline.
2. **WebSocket for agent activity** β The frontend shows each step as it happens (parsing β reasoning β checking β retrieving β synthesizing). This real-time visibility is critical for clinician trust.
3. **Structured outputs everywhere** β Every tool returns a Pydantic model. The synthesis agent receives structured data, not messy text. This ensures consistency and enables frontend rendering.
4. **Gemma in four roles** β As the patient parser (Step 1), clinical reasoning engine (Step 2), conflict detector (Step 5), and synthesis engine (Step 6). The same model extracts structured data, reasons about the case, identifies guideline-vs-patient conflicts, and integrates all tool outputs into a coherent report.
5. **Graceful degradation** β If a tool fails (e.g., OpenFDA is down), the agent continues with available information and notes the gap in the final report.
6. **Curated guideline corpus over general web search** β 62 hand-selected guidelines from authoritative sources (ACC/AHA, ADA, etc.) ensure quality and citability. Better than scraping the web.
7. **ChromaDB for simplicity** β Embedded vector DB that persists locally. No external database service to manage. Rebuilds in seconds from the JSON source.
---
## Configuration
All configuration lives in `config.py` (Pydantic Settings) and `.env`:
| Setting | Default | Description |
|---------|---------|-------------|
| `MEDGEMMA_API_KEY` | (required) | HuggingFace API token or Google AI Studio API key |
| `MEDGEMMA_BASE_URL` | `""` (empty) | LLM API endpoint (HF Endpoint URL with /v1, or Google AI Studio URL) |
| `MEDGEMMA_MODEL_ID` | `google/medgemma` | Model identifier (`tgi` for HF Endpoints, or full model name) |
| `HF_TOKEN` | `""` | HuggingFace token for dataset downloads |
| `CHROMA_PERSIST_DIR` | `./data/chroma` | ChromaDB storage directory |
| `EMBEDDING_MODEL` | `sentence-transformers/all-MiniLM-L6-v2` | Embedding model for RAG |
| `MAX_GUIDELINES` | `5` | Number of guidelines to retrieve per query |
| `AGENT_TIMEOUT` | `120` | Max seconds for full pipeline execution |
---
## Known Limitations
- **LLM latency:** Full pipeline takes ~75 s due to multiple sequential LLM calls. Could be improved with smaller models or parallel LLM calls.
- **No authentication:** No user auth β designed as a local demo / research tool.
- **Single-model:** Uses only MedGemma 27B Text IT. Could benefit from specialized models for different steps.
- **Guideline currency:** Guidelines are a static snapshot. A production system would need automated updates.
- **No EHR integration:** Input is manual text paste. A production system would integrate with EHR FHIR APIs.
---
## Validation Framework
The project includes an external dataset validation framework that tests the full pipeline against real-world clinical data β bypassing the HTTP server and calling the `Orchestrator` directly.
### Datasets
| Dataset | Source | Cases | What It Measures |
|---------|--------|-------|------------------|
| **MedQA (USMLE)** | HuggingFace (`GBaker/MedQA-USMLE-4-options`) | 1,273 | Diagnostic accuracy β top-1, top-3, mentioned |
| **MTSamples** | GitHub (`socd06/medical-nlp`) | ~5,000 | Parse quality, field completeness, specialty alignment |
| **PMC Case Reports** | PubMed E-utilities (esearch + efetch) | Dynamic | Diagnostic accuracy on published cases with known diagnoses |
### Architecture
```
validation/
βββ base.py # ValidationCase, ValidationResult, ValidationSummary
β # run_cds_pipeline() β direct Orchestrator invocation
β # fuzzy_match(), diagnosis_in_differential()
βββ harness_medqa.py # Fetch from HuggingFace, extract vignettes, score diagnostics
βββ harness_mtsamples.py # Fetch CSV, stratified sampling, score parse quality
βββ harness_pmc.py # PubMed E-utilities, title-based diagnosis extraction
βββ run_validation.py # Unified CLI: --medqa --mtsamples --pmc --all --max-cases N
```
All datasets are cached locally in `validation/data/` (gitignored). Results are saved to `validation/results/` (also gitignored).
|