bshepp commited on
Commit ·
c28dd56
1
Parent(s): e684a6c
Update all documentation for conflict detection feature
Browse files- README.md: 5→6 step pipeline, updated architecture diagram, project
structure (new conflict_detection.py), usage instructions
- docs/architecture.md: New Step 5 section, updated diagram, data models
table (3 new models), component descriptions, agentic comparison
- DEVELOPMENT_LOG.md: Added Phase 8 documenting the conflict detection
design decision (why confidence scores were dropped) and full
implementation details
- docs/writeup_draft.md: Updated pipeline description, architecture
diagram, performance table, practical usage section
- docs/test_results.md: Updated E2E test to reflect 6-step pipeline
- DEVELOPMENT_LOG.md +48 -0
- README.md +28 -26
- docs/architecture.md +44 -24
- docs/test_results.md +4 -3
- docs/writeup_draft.md +10 -7
DEVELOPMENT_LOG.md
CHANGED
|
@@ -173,6 +173,54 @@ Rewrote/created all documentation:
|
|
| 173 |
|
| 174 |
---
|
| 175 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 176 |
## Dependency Inventory
|
| 177 |
|
| 178 |
### Python Backend (`requirements.txt`)
|
|
|
|
| 173 |
|
| 174 |
---
|
| 175 |
|
| 176 |
+
## Phase 8: Conflict Detection Feature
|
| 177 |
+
|
| 178 |
+
### Design Decision: Drop Confidence Scores, Add Conflict Detection
|
| 179 |
+
|
| 180 |
+
During review, identified that the system's "confidence" was just the LLM picking a label (LOW/MODERATE/HIGH) — not a calibrated score. Composite numeric confidence scores were considered and **rejected** because:
|
| 181 |
+
- Uncalibrated confidence values are dangerous (clinician anchoring bias)
|
| 182 |
+
- No training data exists to calibrate outputs
|
| 183 |
+
- A single number hides more than it reveals
|
| 184 |
+
|
| 185 |
+
**Instead, added Conflict Detection** — a new pipeline step that compares guideline recommendations against the patient's actual data to identify specific, actionable gaps. This provides direct patient safety value without requiring calibration.
|
| 186 |
+
|
| 187 |
+
### Implementation
|
| 188 |
+
|
| 189 |
+
**New models added to `schemas.py`:**
|
| 190 |
+
- `ConflictType` enum — 6 categories: omission, contradiction, dosage, monitoring, allergy_risk, interaction_gap
|
| 191 |
+
- `ClinicalConflict` model — Each conflict has: type, severity, guideline_source, guideline_text, patient_data, description, suggested_resolution
|
| 192 |
+
- `ConflictDetectionResult` — List of conflicts + summary + guidelines_checked count
|
| 193 |
+
- `conflicts` field added to `CDSReport`
|
| 194 |
+
- `conflict_detection` field added to `AgentState`
|
| 195 |
+
|
| 196 |
+
**New tool: `conflict_detection.py`:**
|
| 197 |
+
- Takes patient profile, clinical reasoning, drug interactions, and guidelines
|
| 198 |
+
- Uses MedGemma at low temperature (0.1) for safety-critical analysis
|
| 199 |
+
- Returns structured `ConflictDetectionResult` with specific, actionable conflicts
|
| 200 |
+
- Graceful degradation: returns empty if no guidelines available
|
| 201 |
+
|
| 202 |
+
**Pipeline changes (`orchestrator.py`):**
|
| 203 |
+
- Pipeline expanded from 5 to 6 steps
|
| 204 |
+
- New Step 5: Conflict Detection (between guideline retrieval and synthesis)
|
| 205 |
+
- Synthesis (now Step 6) receives conflict data and prominently includes it in the report
|
| 206 |
+
|
| 207 |
+
**Synthesis changes (`synthesis.py`):**
|
| 208 |
+
- Accepts `conflict_detection` parameter
|
| 209 |
+
- New "Conflicts & Gaps" section in synthesis prompt
|
| 210 |
+
- Fallback: copies detected conflicts directly into report if LLM doesn't populate the structured field
|
| 211 |
+
|
| 212 |
+
**Frontend changes (`CDSReport.tsx`):**
|
| 213 |
+
- New "Conflicts & Gaps Detected" section with high visual prominence
|
| 214 |
+
- Red border container, severity-coded left-accent cards (critical=red, high=orange, moderate=yellow, low=blue)
|
| 215 |
+
- Side-by-side "Guideline says" vs "Patient data" comparison
|
| 216 |
+
- Green-highlighted suggested resolutions
|
| 217 |
+
- Positioned immediately after drug interactions for maximum visibility
|
| 218 |
+
|
| 219 |
+
**Files created:** `src/backend/app/tools/conflict_detection.py` (1 new file)
|
| 220 |
+
**Files modified:** `schemas.py`, `orchestrator.py`, `synthesis.py`, `CDSReport.tsx` (4 files)
|
| 221 |
+
|
| 222 |
+
---
|
| 223 |
+
|
| 224 |
## Dependency Inventory
|
| 225 |
|
| 226 |
### Python Backend (`requirements.txt`)
|
README.md
CHANGED
|
@@ -15,35 +15,36 @@ A clinician pastes a patient case. The system automatically:
|
|
| 15 |
2. **Reasons** about the case to generate a ranked differential diagnosis with chain-of-thought transparency
|
| 16 |
3. **Checks drug interactions** against OpenFDA and RxNorm databases
|
| 17 |
4. **Retrieves clinical guidelines** from a 62-guideline RAG corpus spanning 14 medical specialties
|
| 18 |
-
5. **
|
|
|
|
| 19 |
|
| 20 |
-
All
|
| 21 |
|
| 22 |
---
|
| 23 |
|
| 24 |
## System Architecture
|
| 25 |
|
| 26 |
```
|
| 27 |
-
┌─────────────────────────────────────────────────────────────────┐
|
| 28 |
-
│ FRONTEND (Next.js 14 + React)
|
| 29 |
-
│ Patient Case Input │ Agent Activity Feed │ CDS Report View
|
| 30 |
-
└──────────────────────────┬──────────────────────────────────────┘
|
| 31 |
│ REST API + WebSocket
|
| 32 |
-
┌──────────────────────────▼──────────────────────────────────────┐
|
| 33 |
-
│ BACKEND (FastAPI + Python 3.10)
|
| 34 |
-
│
|
| 35 |
-
│ ┌────────────────────────────────────────────────────────────┐ │
|
| 36 |
-
│ │
|
| 37 |
-
│ └──
|
| 38 |
-
│
|
| 39 |
-
│
|
| 40 |
-
│
|
| 41 |
-
│ │
|
| 42 |
-
│
|
| 43 |
-
│
|
| 44 |
-
│
|
| 45 |
-
│ External: OpenFDA API │ RxNorm/NLM API │ ChromaDB (local)
|
| 46 |
-
└──────────────────────────────────────────────────────────────────┘
|
| 47 |
```
|
| 48 |
|
| 49 |
See [docs/architecture.md](docs/architecture.md) for the full design document.
|
|
@@ -136,9 +137,9 @@ medgemma_impact_challenge/
|
|
| 136 |
│ │ ├── config.py # Pydantic Settings (ports, models, dirs)
|
| 137 |
│ │ ├── __init__.py
|
| 138 |
│ │ ├── models/
|
| 139 |
-
│ │ │ └── schemas.py # All Pydantic models (~
|
| 140 |
│ │ ├── agent/
|
| 141 |
-
│ │ │ └── orchestrator.py #
|
| 142 |
│ │ ├── services/
|
| 143 |
│ │ │ └── medgemma.py # LLM service (OpenAI-compatible API)
|
| 144 |
│ │ ├── tools/
|
|
@@ -146,7 +147,8 @@ medgemma_impact_challenge/
|
|
| 146 |
│ │ │ ├── clinical_reasoning.py # Step 2: Differential diagnosis
|
| 147 |
│ │ │ ├── drug_interactions.py # Step 3: OpenFDA + RxNorm
|
| 148 |
│ │ │ ├── guideline_retrieval.py # Step 4: RAG over ChromaDB
|
| 149 |
-
│ │ │
|
|
|
|
| 150 |
│ │ ├── data/
|
| 151 |
│ │ │ └── clinical_guidelines.json # 62 guidelines, 14 specialties
|
| 152 |
│ │ └── api/
|
|
@@ -240,8 +242,8 @@ python test_clinical_cases.py --report results.json # Save results
|
|
| 240 |
1. Open `http://localhost:3000`
|
| 241 |
2. Paste a patient case description (or click a sample case)
|
| 242 |
3. Click **"Analyze Patient Case"**
|
| 243 |
-
4. Watch the
|
| 244 |
-
5. Review the CDS report: differential diagnosis, drug warnings, guideline recommendations, next steps
|
| 245 |
|
| 246 |
---
|
| 247 |
|
|
|
|
| 15 |
2. **Reasons** about the case to generate a ranked differential diagnosis with chain-of-thought transparency
|
| 16 |
3. **Checks drug interactions** against OpenFDA and RxNorm databases
|
| 17 |
4. **Retrieves clinical guidelines** from a 62-guideline RAG corpus spanning 14 medical specialties
|
| 18 |
+
5. **Detects conflicts** between guideline recommendations and the patient's actual data — surfacing omissions, contradictions, dosage concerns, and monitoring gaps
|
| 19 |
+
6. **Synthesizes** everything into a structured CDS report with recommendations, warnings, conflicts, and citations
|
| 20 |
|
| 21 |
+
All six steps stream to the frontend in real time via WebSocket — the clinician sees each step execute live.
|
| 22 |
|
| 23 |
---
|
| 24 |
|
| 25 |
## System Architecture
|
| 26 |
|
| 27 |
```
|
| 28 |
+
┌─────────────────────────────────────────────────────────────────────┐
|
| 29 |
+
│ FRONTEND (Next.js 14 + React) │
|
| 30 |
+
│ Patient Case Input │ Agent Activity Feed │ CDS Report View │
|
| 31 |
+
└──────────────────────────┬──────────────────────────────────────────┘
|
| 32 |
│ REST API + WebSocket
|
| 33 |
+
┌──────────────────────────▼──────────────────────────────────────────┐
|
| 34 |
+
│ BACKEND (FastAPI + Python 3.10) │
|
| 35 |
+
│ │
|
| 36 |
+
│ ┌────────────────────────────────────────────────────────────────┐ │
|
| 37 |
+
│ │ ORCHESTRATOR (6-Step Pipeline) │ │
|
| 38 |
+
│ └──┬──────────┬──────────┬──────────┬──────────┬──────────┬─────┘ │
|
| 39 |
+
│ ┌──▼───┐ ┌───▼────┐ ┌──▼───┐ ┌───▼────┐ ┌───▼─────┐ ┌���─▼────┐ │
|
| 40 |
+
│ │Parse │ │Reason │ │ Drug │ │ RAG │ │Conflict │ │Synth- │ │
|
| 41 |
+
│ │Pati- │ │(LLM) │ │Check │ │Guide- │ │Detect- │ │esize │ │
|
| 42 |
+
│ │ent │ │Differ- │ │OpenFDA│ │lines │ │ion │ │(LLM) │ │
|
| 43 |
+
│ │Data │ │ential │ │RxNorm │ │ChromaDB│ │(LLM) │ │Report │ │
|
| 44 |
+
│ └──────┘ └────────┘ └──────┘ └────────┘ └─────────┘ └───────┘ │
|
| 45 |
+
│ │
|
| 46 |
+
│ External: OpenFDA API │ RxNorm/NLM API │ ChromaDB (local) │
|
| 47 |
+
└──────────────────────────────────────────────────────────────────────┘
|
| 48 |
```
|
| 49 |
|
| 50 |
See [docs/architecture.md](docs/architecture.md) for the full design document.
|
|
|
|
| 137 |
│ │ ├── config.py # Pydantic Settings (ports, models, dirs)
|
| 138 |
│ │ ├── __init__.py
|
| 139 |
│ │ ├── models/
|
| 140 |
+
│ │ │ └── schemas.py # All Pydantic models (~280 lines)
|
| 141 |
│ │ ├── agent/
|
| 142 |
+
│ │ │ └── orchestrator.py # 6-step pipeline orchestrator (~300 lines)
|
| 143 |
│ │ ├── services/
|
| 144 |
│ │ │ └── medgemma.py # LLM service (OpenAI-compatible API)
|
| 145 |
│ │ ├── tools/
|
|
|
|
| 147 |
│ │ │ ├── clinical_reasoning.py # Step 2: Differential diagnosis
|
| 148 |
│ │ │ ├── drug_interactions.py # Step 3: OpenFDA + RxNorm
|
| 149 |
│ │ │ ├── guideline_retrieval.py # Step 4: RAG over ChromaDB
|
| 150 |
+
│ │ │ ├── conflict_detection.py # Step 5: Guideline vs patient conflicts
|
| 151 |
+
│ │ │ └── synthesis.py # Step 6: CDS report generation
|
| 152 |
│ │ ├── data/
|
| 153 |
│ │ │ └── clinical_guidelines.json # 62 guidelines, 14 specialties
|
| 154 |
│ │ └── api/
|
|
|
|
| 242 |
1. Open `http://localhost:3000`
|
| 243 |
2. Paste a patient case description (or click a sample case)
|
| 244 |
3. Click **"Analyze Patient Case"**
|
| 245 |
+
4. Watch the 6-step agent pipeline execute in real time
|
| 246 |
+
5. Review the CDS report: differential diagnosis, drug warnings, **conflicts & gaps**, guideline recommendations, next steps
|
| 247 |
|
| 248 |
---
|
| 249 |
|
docs/architecture.md
CHANGED
|
@@ -29,19 +29,19 @@ structured clinical decision support report — all in seconds.
|
|
| 29 |
│ BACKEND (FastAPI + Python 3.10) │
|
| 30 |
│ Port 8000 (default) / 8002 (dev) │
|
| 31 |
│ │
|
| 32 |
-
│ ┌────────────────────────────────────────────────────────────┐ │
|
| 33 |
-
│ │ ORCHESTRATOR (orchestrator.py,
|
| 34 |
-
│ │ Sequential
|
| 35 |
-
│ └─────┬──────────┬──────────┬──────────┬──────────┬────────
|
| 36 |
-
│
|
| 37 |
-
│
|
| 38 |
-
│
|
| 39 |
-
│
|
| 40 |
-
│
|
| 41 |
-
│
|
| 42 |
-
│
|
| 43 |
-
│
|
| 44 |
-
│
|
| 45 |
│ ┌────▼────┐ ┌─▼──────────────┐ │
|
| 46 |
│ │OpenFDA │ │ChromaDB │ │
|
| 47 |
│ │RxNorm │ │62 guidelines │ │
|
|
@@ -100,20 +100,37 @@ LLM: gemma-3-27b-it via Google AI Studio
|
|
| 100 |
- **Fallback:** If `clinical_guidelines.json` is missing, falls back to 2 minimal embedded guidelines
|
| 101 |
- **Timing:** ~9.6 s (observed)
|
| 102 |
|
| 103 |
-
### Step 5:
|
| 104 |
-
|
| 105 |
-
- **Input:**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
- **Output:** `CDSReport` (comprehensive structured report)
|
| 107 |
- **Report sections:**
|
| 108 |
- Patient summary
|
| 109 |
- Differential diagnosis with reasoning chains
|
| 110 |
- Drug interaction warnings with severity
|
|
|
|
| 111 |
- Guideline-concordant recommendations with citations
|
| 112 |
- Suggested next steps (immediate, short-term, long-term)
|
| 113 |
-
-
|
| 114 |
- **Timing:** ~25.3 s (observed)
|
| 115 |
|
| 116 |
-
**Total pipeline time:** ~75 s for a complex case (
|
| 117 |
|
| 118 |
---
|
| 119 |
|
|
@@ -148,7 +165,7 @@ This preserves the intended behavior while staying compatible with Gemma's API c
|
|
| 148 |
|
| 149 |
## Data Models (Pydantic v2)
|
| 150 |
|
| 151 |
-
All pipeline data is strongly typed via Pydantic models in `schemas.py` (~
|
| 152 |
|
| 153 |
| Model | Purpose |
|
| 154 |
|-------|---------|
|
|
@@ -160,7 +177,10 @@ All pipeline data is strongly typed via Pydantic models in `schemas.py` (~238 li
|
|
| 160 |
| `DrugInteractionResult` | Step 3 output: all interaction data |
|
| 161 |
| `GuidelineExcerpt` | Individual guideline citation |
|
| 162 |
| `GuidelineRetrievalResult` | Step 4 output: relevant guidelines |
|
| 163 |
-
| `
|
|
|
|
|
|
|
|
|
|
| 164 |
| `AgentStep` | WebSocket message: step name, status, data, timing |
|
| 165 |
|
| 166 |
---
|
|
@@ -178,8 +198,8 @@ All pipeline data is strongly typed via Pydantic models in `schemas.py` (~238 li
|
|
| 178 |
| Component | Role |
|
| 179 |
|-----------|------|
|
| 180 |
| `PatientInput.tsx` | Text area for patient case + 3 pre-loaded sample cases (chest pain, DKA, pediatric fever) |
|
| 181 |
-
| `AgentPipeline.tsx` | Visualizes the
|
| 182 |
-
| `CDSReport.tsx` | Renders the final CDS report: patient summary, differentials, drug warnings, guidelines, next steps |
|
| 183 |
|
| 184 |
### Communication
|
| 185 |
|
|
@@ -215,8 +235,8 @@ All pipeline data is strongly typed via Pydantic models in `schemas.py` (~238 li
|
|
| 215 |
|
| 216 |
| Characteristic | Chatbot | This Agent System |
|
| 217 |
|----------------|---------|-------------------|
|
| 218 |
-
| Tool use | None |
|
| 219 |
-
| Planning | None | Orchestrator executes a defined
|
| 220 |
| State management | Stateless | Patient context flows through all steps |
|
| 221 |
| Error handling | Generic | Tool-specific fallbacks, graceful degradation |
|
| 222 |
| Output structure | Free text | Pydantic-validated, structured, cited |
|
|
|
|
| 29 |
│ BACKEND (FastAPI + Python 3.10) │
|
| 30 |
│ Port 8000 (default) / 8002 (dev) │
|
| 31 |
│ │
|
| 32 |
+
│ ┌────────────────────────────────────────────────────────────────────┐ │
|
| 33 |
+
│ │ ORCHESTRATOR (orchestrator.py, ~300 lines) │ │
|
| 34 |
+
│ │ Sequential 6-step pipeline with structured state passing │ │
|
| 35 |
+
│ └──┬──────────┬──────────┬──────────┬──────────┬──────────┬────────┘ │
|
| 36 |
+
│ │ │ │ │ │ │ │
|
| 37 |
+
│ ┌──▼───┐ ┌───▼────┐ ┌──▼───┐ ┌───▼────┐ ┌───▼─────┐ ┌──▼──────┐ │
|
| 38 |
+
│ │Step 1│ │Step 2 │ │Step 3│ │Step 4 │ │Step 5 │ │Step 6 │ │
|
| 39 |
+
│ │Pati- │ │Clini- │ │Drug │ │Guide- │ │Conflict │ │Synthe- │ │
|
| 40 |
+
│ │ent │ │cal │ │Inter-│ │line │ │Detect- │ │sis │ │
|
| 41 |
+
│ │Parser│ │Reason- │ │action│ │Retriev-│ │ion │ │Agent │ │
|
| 42 |
+
│ │(LLM) │ │ing │ │(APIs)│ │al(RAG) │ │(LLM) │ │(LLM) │ │
|
| 43 |
+
│ └──────┘ │(LLM) │ └──┬──┘ └──┬─────┘ └─────────┘ └─────────┘ │
|
| 44 |
+
│ └────────┘ │ │ │
|
| 45 |
│ ┌────▼────┐ ┌─▼──────────────┐ │
|
| 46 |
│ │OpenFDA │ │ChromaDB │ │
|
| 47 |
│ │RxNorm │ │62 guidelines │ │
|
|
|
|
| 100 |
- **Fallback:** If `clinical_guidelines.json` is missing, falls back to 2 minimal embedded guidelines
|
| 101 |
- **Timing:** ~9.6 s (observed)
|
| 102 |
|
| 103 |
+
### Step 5: Conflict Detection (`conflict_detection.py`)
|
| 104 |
+
|
| 105 |
+
- **Input:** Patient profile, clinical reasoning, drug interactions, and retrieved guidelines from Steps 1–4
|
| 106 |
+
- **Output:** `ConflictDetectionResult` with specific `ClinicalConflict` items
|
| 107 |
+
- **Method:** LLM-based comparison of guideline recommendations against the patient's actual data
|
| 108 |
+
- **Conflict types detected:**
|
| 109 |
+
- **Omission** — Guideline recommends something the patient is not receiving
|
| 110 |
+
- **Contradiction** — Patient's current treatment conflicts with guideline advice
|
| 111 |
+
- **Dosage** — Guideline specifies dose adjustments that apply to this patient (age, renal function, etc.)
|
| 112 |
+
- **Monitoring** — Guideline requires monitoring that is not documented as ordered
|
| 113 |
+
- **Allergy Risk** — Guideline-recommended treatment involves a medication the patient is allergic to
|
| 114 |
+
- **Interaction Gap** — Known drug interaction is not addressed in the care plan
|
| 115 |
+
- **Each conflict includes:** severity (critical/high/moderate/low), guideline source, guideline text, patient data, description, and suggested resolution
|
| 116 |
+
- **Temperature:** 0.1 (low, for safety-critical analysis)
|
| 117 |
+
- **Graceful degradation:** Returns empty result if no guidelines were retrieved (Step 4 skipped/failed)
|
| 118 |
+
|
| 119 |
+
### Step 6: Synthesis Agent (`synthesis.py`)
|
| 120 |
+
|
| 121 |
+
- **Input:** All outputs from Steps 1–4 plus conflict detection results
|
| 122 |
- **Output:** `CDSReport` (comprehensive structured report)
|
| 123 |
- **Report sections:**
|
| 124 |
- Patient summary
|
| 125 |
- Differential diagnosis with reasoning chains
|
| 126 |
- Drug interaction warnings with severity
|
| 127 |
+
- **Conflicts & gaps** — prominently featured with guideline vs patient data comparison
|
| 128 |
- Guideline-concordant recommendations with citations
|
| 129 |
- Suggested next steps (immediate, short-term, long-term)
|
| 130 |
+
- Caveats and limitations
|
| 131 |
- **Timing:** ~25.3 s (observed)
|
| 132 |
|
| 133 |
+
**Total pipeline time:** ~75–85 s for a complex case (6 steps, with Steps 3–4 parallel).
|
| 134 |
|
| 135 |
---
|
| 136 |
|
|
|
|
| 165 |
|
| 166 |
## Data Models (Pydantic v2)
|
| 167 |
|
| 168 |
+
All pipeline data is strongly typed via Pydantic models in `schemas.py` (~280 lines):
|
| 169 |
|
| 170 |
| Model | Purpose |
|
| 171 |
|-------|---------|
|
|
|
|
| 177 |
| `DrugInteractionResult` | Step 3 output: all interaction data |
|
| 178 |
| `GuidelineExcerpt` | Individual guideline citation |
|
| 179 |
| `GuidelineRetrievalResult` | Step 4 output: relevant guidelines |
|
| 180 |
+
| `ConflictType` | Enum: omission, contradiction, dosage, monitoring, allergy_risk, interaction_gap |
|
| 181 |
+
| `ClinicalConflict` | Individual conflict: guideline_text vs patient_data + suggested resolution |
|
| 182 |
+
| `ConflictDetectionResult` | Step 5 output: all detected conflicts |
|
| 183 |
+
| `CDSReport` | Step 6 output: full synthesized report (now includes conflicts) |
|
| 184 |
| `AgentStep` | WebSocket message: step name, status, data, timing |
|
| 185 |
|
| 186 |
---
|
|
|
|
| 198 |
| Component | Role |
|
| 199 |
|-----------|------|
|
| 200 |
| `PatientInput.tsx` | Text area for patient case + 3 pre-loaded sample cases (chest pain, DKA, pediatric fever) |
|
| 201 |
+
| `AgentPipeline.tsx` | Visualizes the 6-step pipeline in real time — shows status (pending / running / complete / error) for each step as WebSocket messages arrive |
|
| 202 |
+
| `CDSReport.tsx` | Renders the final CDS report: patient summary, differentials, drug warnings, **conflicts & gaps** (prominently styled), guidelines, next steps |
|
| 203 |
|
| 204 |
### Communication
|
| 205 |
|
|
|
|
| 235 |
|
| 236 |
| Characteristic | Chatbot | This Agent System |
|
| 237 |
|----------------|---------|-------------------|
|
| 238 |
+
| Tool use | None | 5+ specialized tools (parser, drug API, RAG, conflict detection, synthesis) |
|
| 239 |
+
| Planning | None | Orchestrator executes a defined 6-step plan |
|
| 240 |
| State management | Stateless | Patient context flows through all steps |
|
| 241 |
| Error handling | Generic | Tool-specific fallbacks, graceful degradation |
|
| 242 |
| Output structure | Free text | Pydantic-validated, structured, cited |
|
docs/test_results.md
CHANGED
|
@@ -60,7 +60,7 @@ python test_rag_quality.py --rebuild --verbose
|
|
| 60 |
## 2. End-to-End Pipeline Test
|
| 61 |
|
| 62 |
**Test file:** `src/backend/test_e2e.py`
|
| 63 |
-
**What it tests:** Full
|
| 64 |
**Test case:** 62-year-old male with crushing substernal chest pain, diaphoresis, nausea, HTN history, on lisinopril + metformin + atorvastatin.
|
| 65 |
|
| 66 |
### Pipeline Step Results
|
|
@@ -71,7 +71,8 @@ python test_rag_quality.py --rebuild --verbose
|
|
| 71 |
| 2. Clinical Reasoning | PASSED | 21.2 s | Top differential: Acute Coronary Syndrome (ACS). Also considered: GERD, PE, aortic dissection |
|
| 72 |
| 3. Drug Interaction Check | PASSED | 11.3 s | Queried OpenFDA + RxNorm for lisinopril, metformin, atorvastatin interactions |
|
| 73 |
| 4. Guideline Retrieval | PASSED | 9.6 s | Retrieved ACC/AHA chest pain / ACS guidelines from RAG corpus |
|
| 74 |
-
| 5.
|
|
|
|
| 75 |
|
| 76 |
**Total pipeline time:** 75.2 s
|
| 77 |
|
|
@@ -185,7 +186,7 @@ python test_clinical_cases.py --quiet
|
|
| 185 |
|
| 186 |
| File | Lines | Purpose |
|
| 187 |
|------|-------|---------|
|
| 188 |
-
| `test_e2e.py` | 57 | Submit chest pain case, poll for completion, validate all
|
| 189 |
| `test_clinical_cases.py` | ~400 | 22 clinical cases with keyword validation, CLI flags for filtering |
|
| 190 |
| `test_rag_quality.py` | ~350 | 30 RAG retrieval queries with expected guideline IDs, relevance scoring |
|
| 191 |
| `test_poll.py` | ~30 | Utility: poll a case ID until completion |
|
|
|
|
| 60 |
## 2. End-to-End Pipeline Test
|
| 61 |
|
| 62 |
**Test file:** `src/backend/test_e2e.py`
|
| 63 |
+
**What it tests:** Full 6-step agent pipeline from free-text input to synthesized CDS report.
|
| 64 |
**Test case:** 62-year-old male with crushing substernal chest pain, diaphoresis, nausea, HTN history, on lisinopril + metformin + atorvastatin.
|
| 65 |
|
| 66 |
### Pipeline Step Results
|
|
|
|
| 71 |
| 2. Clinical Reasoning | PASSED | 21.2 s | Top differential: Acute Coronary Syndrome (ACS). Also considered: GERD, PE, aortic dissection |
|
| 72 |
| 3. Drug Interaction Check | PASSED | 11.3 s | Queried OpenFDA + RxNorm for lisinopril, metformin, atorvastatin interactions |
|
| 73 |
| 4. Guideline Retrieval | PASSED | 9.6 s | Retrieved ACC/AHA chest pain / ACS guidelines from RAG corpus |
|
| 74 |
+
| 5. Conflict Detection | PASSED | — | Compares guidelines against patient data for omissions, contradictions, dosage, monitoring gaps |
|
| 75 |
+
| 6. Synthesis | PASSED | 25.3 s | Generated comprehensive CDS report with differential, warnings, conflicts, guideline recommendations |
|
| 76 |
|
| 77 |
**Total pipeline time:** 75.2 s
|
| 78 |
|
|
|
|
| 186 |
|
| 187 |
| File | Lines | Purpose |
|
| 188 |
|------|-------|---------|
|
| 189 |
+
| `test_e2e.py` | 57 | Submit chest pain case, poll for completion, validate all 6 steps |
|
| 190 |
| `test_clinical_cases.py` | ~400 | 22 clinical cases with keyword validation, CLI flags for filtering |
|
| 191 |
| `test_rag_quality.py` | ~350 | 30 RAG retrieval queries with expected guideline IDs, relevance scoring |
|
| 192 |
| `test_poll.py` | ~30 | Utility: poll a case ID until completion |
|
docs/writeup_draft.md
CHANGED
|
@@ -65,15 +65,16 @@ Gemma 3 27B IT provides the right balance of capability and accessibility for a
|
|
| 65 |
|
| 66 |
**How the model is used:**
|
| 67 |
|
| 68 |
-
The model serves as the reasoning engine in a
|
| 69 |
|
| 70 |
1. **Patient Data Parsing** (LLM) — Extracts structured patient data from free-text clinical narratives
|
| 71 |
2. **Clinical Reasoning** (LLM) — Generates ranked differential diagnoses with chain-of-thought reasoning
|
| 72 |
3. **Drug Interaction Check** (External APIs) — Queries OpenFDA and RxNorm for medication safety
|
| 73 |
4. **Guideline Retrieval** (RAG) — Retrieves relevant clinical guidelines from a 62-guideline corpus using ChromaDB
|
| 74 |
-
5. **
|
|
|
|
| 75 |
|
| 76 |
-
The model is used in Steps 1, 2, and
|
| 77 |
|
| 78 |
### Technical details
|
| 79 |
|
|
@@ -82,12 +83,13 @@ The model is used in Steps 1, 2, and 5 — parsing, reasoning, and synthesis. Th
|
|
| 82 |
```
|
| 83 |
Frontend (Next.js 14) ←→ Backend (FastAPI + Python 3.10)
|
| 84 |
│
|
| 85 |
-
Orchestrator (
|
| 86 |
├── Step 1: Patient Parser (LLM)
|
| 87 |
├── Step 2: Clinical Reasoning (LLM)
|
| 88 |
├── Step 3: Drug Check (OpenFDA + RxNorm APIs)
|
| 89 |
├── Step 4: Guideline Retrieval (ChromaDB RAG)
|
| 90 |
-
|
|
|
|
| 91 |
```
|
| 92 |
|
| 93 |
All inter-step data is strongly typed with Pydantic v2 models. The pipeline streams each step's progress to the frontend via WebSocket for real-time visibility.
|
|
@@ -100,7 +102,7 @@ No fine-tuning was performed in the current version. The base `gemma-3-27b-it` m
|
|
| 100 |
|
| 101 |
| Test | Result |
|
| 102 |
|------|--------|
|
| 103 |
-
| E2E pipeline (chest pain / ACS) | All
|
| 104 |
| RAG retrieval quality | 30/30 queries passed (100%), avg relevance 0.639 |
|
| 105 |
| Clinical test suite | 22 scenarios across 14 specialties |
|
| 106 |
| Top-1 RAG accuracy | 100% — correct guideline ranked #1 for all queries |
|
|
@@ -127,10 +129,11 @@ No fine-tuning was performed in the current version. The base `gemma-3-27b-it` m
|
|
| 127 |
In a real clinical setting, the system would be used at the point of care:
|
| 128 |
1. Clinician opens the CDS Agent interface (embedded in the EHR or as a standalone app)
|
| 129 |
2. Patient data is automatically pulled from the EHR (or pasted manually)
|
| 130 |
-
3. The agent pipeline runs in ~60
|
| 131 |
4. The CDS report appears with:
|
| 132 |
- Ranked differential diagnoses with reasoning chains (transparent AI)
|
| 133 |
- Drug interaction warnings with severity levels
|
|
|
|
| 134 |
- Relevant clinical guideline excerpts with citations to authoritative sources
|
| 135 |
- Suggested next steps (immediate, short-term, long-term)
|
| 136 |
5. The clinician reviews the recommendations and incorporates them into their clinical judgment
|
|
|
|
| 65 |
|
| 66 |
**How the model is used:**
|
| 67 |
|
| 68 |
+
The model serves as the reasoning engine in a 6-step agentic pipeline:
|
| 69 |
|
| 70 |
1. **Patient Data Parsing** (LLM) — Extracts structured patient data from free-text clinical narratives
|
| 71 |
2. **Clinical Reasoning** (LLM) — Generates ranked differential diagnoses with chain-of-thought reasoning
|
| 72 |
3. **Drug Interaction Check** (External APIs) — Queries OpenFDA and RxNorm for medication safety
|
| 73 |
4. **Guideline Retrieval** (RAG) — Retrieves relevant clinical guidelines from a 62-guideline corpus using ChromaDB
|
| 74 |
+
5. **Conflict Detection** (LLM) — Compares guideline recommendations against patient data to identify omissions, contradictions, dosage concerns, monitoring gaps, allergy risks, and interaction gaps
|
| 75 |
+
6. **Synthesis** (LLM) — Integrates all outputs into a comprehensive CDS report with conflicts prominently featured
|
| 76 |
|
| 77 |
+
The model is used in Steps 1, 2, 5, and 6 — parsing, reasoning, conflict detection, and synthesis. This demonstrates the model used "to its fullest potential" across multiple distinct clinical tasks within a single workflow.
|
| 78 |
|
| 79 |
### Technical details
|
| 80 |
|
|
|
|
| 83 |
```
|
| 84 |
Frontend (Next.js 14) ←→ Backend (FastAPI + Python 3.10)
|
| 85 |
│
|
| 86 |
+
Orchestrator (6-step pipeline)
|
| 87 |
├── Step 1: Patient Parser (LLM)
|
| 88 |
├── Step 2: Clinical Reasoning (LLM)
|
| 89 |
├── Step 3: Drug Check (OpenFDA + RxNorm APIs)
|
| 90 |
├── Step 4: Guideline Retrieval (ChromaDB RAG)
|
| 91 |
+
├── Step 5: Conflict Detection (LLM)
|
| 92 |
+
└── Step 6: Synthesis (LLM)
|
| 93 |
```
|
| 94 |
|
| 95 |
All inter-step data is strongly typed with Pydantic v2 models. The pipeline streams each step's progress to the frontend via WebSocket for real-time visibility.
|
|
|
|
| 102 |
|
| 103 |
| Test | Result |
|
| 104 |
|------|--------|
|
| 105 |
+
| E2E pipeline (chest pain / ACS) | All 6 steps passed, ~75–85 s total |
|
| 106 |
| RAG retrieval quality | 30/30 queries passed (100%), avg relevance 0.639 |
|
| 107 |
| Clinical test suite | 22 scenarios across 14 specialties |
|
| 108 |
| Top-1 RAG accuracy | 100% — correct guideline ranked #1 for all queries |
|
|
|
|
| 129 |
In a real clinical setting, the system would be used at the point of care:
|
| 130 |
1. Clinician opens the CDS Agent interface (embedded in the EHR or as a standalone app)
|
| 131 |
2. Patient data is automatically pulled from the EHR (or pasted manually)
|
| 132 |
+
3. The agent pipeline runs in ~60–90 seconds, during which the clinician can continue other tasks
|
| 133 |
4. The CDS report appears with:
|
| 134 |
- Ranked differential diagnoses with reasoning chains (transparent AI)
|
| 135 |
- Drug interaction warnings with severity levels
|
| 136 |
+
- **Conflicts & gaps** between guideline recommendations and the patient's actual data — prominently displayed with specific guideline citations, patient data comparisons, and suggested resolutions
|
| 137 |
- Relevant clinical guideline excerpts with citations to authoritative sources
|
| 138 |
- Suggested next steps (immediate, short-term, long-term)
|
| 139 |
5. The clinician reviews the recommendations and incorporates them into their clinical judgment
|