bshepp commited on
Commit
c28dd56
·
1 Parent(s): e684a6c

Update all documentation for conflict detection feature

Browse files

- README.md: 5→6 step pipeline, updated architecture diagram, project
structure (new conflict_detection.py), usage instructions
- docs/architecture.md: New Step 5 section, updated diagram, data models
table (3 new models), component descriptions, agentic comparison
- DEVELOPMENT_LOG.md: Added Phase 8 documenting the conflict detection
design decision (why confidence scores were dropped) and full
implementation details
- docs/writeup_draft.md: Updated pipeline description, architecture
diagram, performance table, practical usage section
- docs/test_results.md: Updated E2E test to reflect 6-step pipeline

DEVELOPMENT_LOG.md CHANGED
@@ -173,6 +173,54 @@ Rewrote/created all documentation:
173
 
174
  ---
175
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
176
  ## Dependency Inventory
177
 
178
  ### Python Backend (`requirements.txt`)
 
173
 
174
  ---
175
 
176
+ ## Phase 8: Conflict Detection Feature
177
+
178
+ ### Design Decision: Drop Confidence Scores, Add Conflict Detection
179
+
180
+ During review, identified that the system's "confidence" was just the LLM picking a label (LOW/MODERATE/HIGH) — not a calibrated score. Composite numeric confidence scores were considered and **rejected** because:
181
+ - Uncalibrated confidence values are dangerous (clinician anchoring bias)
182
+ - No training data exists to calibrate outputs
183
+ - A single number hides more than it reveals
184
+
185
+ **Instead, added Conflict Detection** — a new pipeline step that compares guideline recommendations against the patient's actual data to identify specific, actionable gaps. This provides direct patient safety value without requiring calibration.
186
+
187
+ ### Implementation
188
+
189
+ **New models added to `schemas.py`:**
190
+ - `ConflictType` enum — 6 categories: omission, contradiction, dosage, monitoring, allergy_risk, interaction_gap
191
+ - `ClinicalConflict` model — Each conflict has: type, severity, guideline_source, guideline_text, patient_data, description, suggested_resolution
192
+ - `ConflictDetectionResult` — List of conflicts + summary + guidelines_checked count
193
+ - `conflicts` field added to `CDSReport`
194
+ - `conflict_detection` field added to `AgentState`
195
+
196
+ **New tool: `conflict_detection.py`:**
197
+ - Takes patient profile, clinical reasoning, drug interactions, and guidelines
198
+ - Uses MedGemma at low temperature (0.1) for safety-critical analysis
199
+ - Returns structured `ConflictDetectionResult` with specific, actionable conflicts
200
+ - Graceful degradation: returns empty if no guidelines available
201
+
202
+ **Pipeline changes (`orchestrator.py`):**
203
+ - Pipeline expanded from 5 to 6 steps
204
+ - New Step 5: Conflict Detection (between guideline retrieval and synthesis)
205
+ - Synthesis (now Step 6) receives conflict data and prominently includes it in the report
206
+
207
+ **Synthesis changes (`synthesis.py`):**
208
+ - Accepts `conflict_detection` parameter
209
+ - New "Conflicts & Gaps" section in synthesis prompt
210
+ - Fallback: copies detected conflicts directly into report if LLM doesn't populate the structured field
211
+
212
+ **Frontend changes (`CDSReport.tsx`):**
213
+ - New "Conflicts & Gaps Detected" section with high visual prominence
214
+ - Red border container, severity-coded left-accent cards (critical=red, high=orange, moderate=yellow, low=blue)
215
+ - Side-by-side "Guideline says" vs "Patient data" comparison
216
+ - Green-highlighted suggested resolutions
217
+ - Positioned immediately after drug interactions for maximum visibility
218
+
219
+ **Files created:** `src/backend/app/tools/conflict_detection.py` (1 new file)
220
+ **Files modified:** `schemas.py`, `orchestrator.py`, `synthesis.py`, `CDSReport.tsx` (4 files)
221
+
222
+ ---
223
+
224
  ## Dependency Inventory
225
 
226
  ### Python Backend (`requirements.txt`)
README.md CHANGED
@@ -15,35 +15,36 @@ A clinician pastes a patient case. The system automatically:
15
  2. **Reasons** about the case to generate a ranked differential diagnosis with chain-of-thought transparency
16
  3. **Checks drug interactions** against OpenFDA and RxNorm databases
17
  4. **Retrieves clinical guidelines** from a 62-guideline RAG corpus spanning 14 medical specialties
18
- 5. **Synthesizes** everything into a structured CDS report with recommendations, warnings, and citations
 
19
 
20
- All five steps stream to the frontend in real time via WebSocket — the clinician sees each step execute live.
21
 
22
  ---
23
 
24
  ## System Architecture
25
 
26
  ```
27
- ┌─────────────────────────────────────────────────────────────────┐
28
- │ FRONTEND (Next.js 14 + React)
29
- │ Patient Case Input │ Agent Activity Feed │ CDS Report View
30
- └──────────────────────────┬──────────────────────────────────────┘
31
  │ REST API + WebSocket
32
- ┌──────────────────────────▼──────────────────────────────────────┐
33
- │ BACKEND (FastAPI + Python 3.10)
34
-
35
- │ ┌────────────────────────────────────────────────────────────┐ │
36
- │ │ ORCHESTRATOR (5-Step Pipeline) │ │
37
- │ └─────┬──────────┬──────────┬──────────┬──────────┬─────────┘ │
38
- ┌────▼───┐ ┌───▼────┐ ┌──▼───┐ ┌───▼────┐ ┌───▼─────┐
39
- Parse │ │Reason │ │ Drug │ │ RAG │ │Synth-
40
- Patient │ │(LLM) │ │Check │ │Guide- │ │esize
41
- │ │Data │ │Differ- │ │OpenFDA│ │lines │ │(LLM)
42
- │ │ential │ │RxNorm │ │ChromaDB│ │Report
43
- └────────┘ └────────┘ └──────┘ └────────┘ └─────────┘
44
-
45
- │ External: OpenFDA API │ RxNorm/NLM API │ ChromaDB (local)
46
- └──────────────────────────────────────────────────────────────────┘
47
  ```
48
 
49
  See [docs/architecture.md](docs/architecture.md) for the full design document.
@@ -136,9 +137,9 @@ medgemma_impact_challenge/
136
  │ │ ├── config.py # Pydantic Settings (ports, models, dirs)
137
  │ │ ├── __init__.py
138
  │ │ ├── models/
139
- │ │ │ └── schemas.py # All Pydantic models (~238 lines)
140
  │ │ ├── agent/
141
- │ │ │ └── orchestrator.py # 5-step pipeline orchestrator (267 lines)
142
  │ │ ├── services/
143
  │ │ │ └── medgemma.py # LLM service (OpenAI-compatible API)
144
  │ │ ├── tools/
@@ -146,7 +147,8 @@ medgemma_impact_challenge/
146
  │ │ │ ├── clinical_reasoning.py # Step 2: Differential diagnosis
147
  │ │ │ ├── drug_interactions.py # Step 3: OpenFDA + RxNorm
148
  │ │ │ ├── guideline_retrieval.py # Step 4: RAG over ChromaDB
149
- │ │ │ ── synthesis.py # Step 5: CDS report generation
 
150
  │ │ ├── data/
151
  │ │ │ └── clinical_guidelines.json # 62 guidelines, 14 specialties
152
  │ │ └── api/
@@ -240,8 +242,8 @@ python test_clinical_cases.py --report results.json # Save results
240
  1. Open `http://localhost:3000`
241
  2. Paste a patient case description (or click a sample case)
242
  3. Click **"Analyze Patient Case"**
243
- 4. Watch the 5-step agent pipeline execute in real time
244
- 5. Review the CDS report: differential diagnosis, drug warnings, guideline recommendations, next steps
245
 
246
  ---
247
 
 
15
  2. **Reasons** about the case to generate a ranked differential diagnosis with chain-of-thought transparency
16
  3. **Checks drug interactions** against OpenFDA and RxNorm databases
17
  4. **Retrieves clinical guidelines** from a 62-guideline RAG corpus spanning 14 medical specialties
18
+ 5. **Detects conflicts** between guideline recommendations and the patient's actual data — surfacing omissions, contradictions, dosage concerns, and monitoring gaps
19
+ 6. **Synthesizes** everything into a structured CDS report with recommendations, warnings, conflicts, and citations
20
 
21
+ All six steps stream to the frontend in real time via WebSocket — the clinician sees each step execute live.
22
 
23
  ---
24
 
25
  ## System Architecture
26
 
27
  ```
28
+ ┌─────────────────────────────────────────────────────────────────────
29
+ │ FRONTEND (Next.js 14 + React)
30
+ │ Patient Case Input │ Agent Activity Feed │ CDS Report View
31
+ └──────────────────────────┬──────────────────────────────────────────
32
  │ REST API + WebSocket
33
+ ┌──────────────────────────▼──────────────────────────────────────────
34
+ │ BACKEND (FastAPI + Python 3.10)
35
+
36
+ │ ┌────────────────────────────────────────────────────────────────┐ │
37
+ │ │ ORCHESTRATOR (6-Step Pipeline) │ │
38
+ │ └──┬──────────┬──────────┬──────────┬──────────┬──────────┬─────┘ │
39
+ ┌──▼───┐ ┌───▼────┐ ┌──▼───┐ ┌───▼────┐ ┌───▼─────┐ ┌���─▼────┐
40
+ │Parse │ │Reason │ │ Drug │ │ RAG │ │Conflict │ │Synth-
41
+ Pati- │ │(LLM) │ │Check │ │Guide- │ │Detect- │ │esize
42
+ │ent │ │Differ- │ │OpenFDA│ │lines │ │ion │ │(LLM)
43
+ Data │ │ential │ │RxNorm │ │ChromaDB│ │(LLM) │ │Report
44
+ └──────┘ └────────┘ └──────┘ └────────┘ └─────────┘ └───────┘
45
+
46
+ │ External: OpenFDA API │ RxNorm/NLM API │ ChromaDB (local)
47
+ └──────────────────────────────────────────────────────────────────────
48
  ```
49
 
50
  See [docs/architecture.md](docs/architecture.md) for the full design document.
 
137
  │ │ ├── config.py # Pydantic Settings (ports, models, dirs)
138
  │ │ ├── __init__.py
139
  │ │ ├── models/
140
+ │ │ │ └── schemas.py # All Pydantic models (~280 lines)
141
  │ │ ├── agent/
142
+ │ │ │ └── orchestrator.py # 6-step pipeline orchestrator (~300 lines)
143
  │ │ ├── services/
144
  │ │ │ └── medgemma.py # LLM service (OpenAI-compatible API)
145
  │ │ ├── tools/
 
147
  │ │ │ ├── clinical_reasoning.py # Step 2: Differential diagnosis
148
  │ │ │ ├── drug_interactions.py # Step 3: OpenFDA + RxNorm
149
  │ │ │ ├── guideline_retrieval.py # Step 4: RAG over ChromaDB
150
+ │ │ │ ── conflict_detection.py # Step 5: Guideline vs patient conflicts
151
+ │ │ │ └── synthesis.py # Step 6: CDS report generation
152
  │ │ ├── data/
153
  │ │ │ └── clinical_guidelines.json # 62 guidelines, 14 specialties
154
  │ │ └── api/
 
242
  1. Open `http://localhost:3000`
243
  2. Paste a patient case description (or click a sample case)
244
  3. Click **"Analyze Patient Case"**
245
+ 4. Watch the 6-step agent pipeline execute in real time
246
+ 5. Review the CDS report: differential diagnosis, drug warnings, **conflicts & gaps**, guideline recommendations, next steps
247
 
248
  ---
249
 
docs/architecture.md CHANGED
@@ -29,19 +29,19 @@ structured clinical decision support report — all in seconds.
29
  │ BACKEND (FastAPI + Python 3.10) │
30
  │ Port 8000 (default) / 8002 (dev) │
31
  │ │
32
- │ ┌────────────────────────────────────────────────────────────┐ │
33
- │ │ ORCHESTRATOR (orchestrator.py, 267 lines) │ │
34
- │ │ Sequential 5-step pipeline with structured state passing │ │
35
- │ └─────┬──────────┬──────────┬──────────┬──────────┬────────┘ │
36
- │ │ │ │ │
37
- ┌────▼───┐ ┌───▼────┐ ┌──▼───┐ ┌───▼────┐ ┌───▼─────┐
38
- │Step 1 │ │Step 2 │ │Step 3 │ │Step 4 │ │Step 5 │
39
- Patient │ │Clinical│ │Drug │ │Guide- │ │Synthe- │
40
- │Parser Reason- │ │Inter- │ │line │ │sis │
41
- │ │ing │ │action │ │Retriev-│ │Agent │
42
- │(LLM) │ │(LLM) │ │(APIs) │ │al(RAG) │ │(LLM) │
43
- └────────┘ └────────┘ └──┬───┘ └───────┘ └─────────┘
44
-
45
  │ ┌────▼────┐ ┌─▼──────────────┐ │
46
  │ │OpenFDA │ │ChromaDB │ │
47
  │ │RxNorm │ │62 guidelines │ │
@@ -100,20 +100,37 @@ LLM: gemma-3-27b-it via Google AI Studio
100
  - **Fallback:** If `clinical_guidelines.json` is missing, falls back to 2 minimal embedded guidelines
101
  - **Timing:** ~9.6 s (observed)
102
 
103
- ### Step 5: Synthesis Agent (`synthesis.py`)
104
-
105
- - **Input:** All outputs from Steps 1–4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106
  - **Output:** `CDSReport` (comprehensive structured report)
107
  - **Report sections:**
108
  - Patient summary
109
  - Differential diagnosis with reasoning chains
110
  - Drug interaction warnings with severity
 
111
  - Guideline-concordant recommendations with citations
112
  - Suggested next steps (immediate, short-term, long-term)
113
- - Confidence levels and caveats
114
  - **Timing:** ~25.3 s (observed)
115
 
116
- **Total pipeline time:** ~75 s for a complex case (all 5 steps sequential).
117
 
118
  ---
119
 
@@ -148,7 +165,7 @@ This preserves the intended behavior while staying compatible with Gemma's API c
148
 
149
  ## Data Models (Pydantic v2)
150
 
151
- All pipeline data is strongly typed via Pydantic models in `schemas.py` (~238 lines):
152
 
153
  | Model | Purpose |
154
  |-------|---------|
@@ -160,7 +177,10 @@ All pipeline data is strongly typed via Pydantic models in `schemas.py` (~238 li
160
  | `DrugInteractionResult` | Step 3 output: all interaction data |
161
  | `GuidelineExcerpt` | Individual guideline citation |
162
  | `GuidelineRetrievalResult` | Step 4 output: relevant guidelines |
163
- | `CDSReport` | Step 5 output: full synthesized report |
 
 
 
164
  | `AgentStep` | WebSocket message: step name, status, data, timing |
165
 
166
  ---
@@ -178,8 +198,8 @@ All pipeline data is strongly typed via Pydantic models in `schemas.py` (~238 li
178
  | Component | Role |
179
  |-----------|------|
180
  | `PatientInput.tsx` | Text area for patient case + 3 pre-loaded sample cases (chest pain, DKA, pediatric fever) |
181
- | `AgentPipeline.tsx` | Visualizes the 5-step pipeline in real time — shows status (pending / running / complete / error) for each step as WebSocket messages arrive |
182
- | `CDSReport.tsx` | Renders the final CDS report: patient summary, differentials, drug warnings, guidelines, next steps |
183
 
184
  ### Communication
185
 
@@ -215,8 +235,8 @@ All pipeline data is strongly typed via Pydantic models in `schemas.py` (~238 li
215
 
216
  | Characteristic | Chatbot | This Agent System |
217
  |----------------|---------|-------------------|
218
- | Tool use | None | 4+ specialized tools (parser, drug API, RAG, synthesis) |
219
- | Planning | None | Orchestrator executes a defined 5-step plan |
220
  | State management | Stateless | Patient context flows through all steps |
221
  | Error handling | Generic | Tool-specific fallbacks, graceful degradation |
222
  | Output structure | Free text | Pydantic-validated, structured, cited |
 
29
  │ BACKEND (FastAPI + Python 3.10) │
30
  │ Port 8000 (default) / 8002 (dev) │
31
  │ │
32
+ │ ┌────────────────────────────────────────────────────────────────────┐ │
33
+ │ │ ORCHESTRATOR (orchestrator.py, ~300 lines) │ │
34
+ │ │ Sequential 6-step pipeline with structured state passing │ │
35
+ │ └────────────┬──────────┬──────────┬──────────┬──────────┬────────┘ │
36
+ │ │ │ │ │
37
+ ┌──▼───┐ ┌───▼────┐ ┌──▼───┐ ┌───▼────┐ ┌───▼─────┐ ┌──▼──────┐
38
+ │Step 1│ │Step 2 │ │Step 3│ │Step 4 │ │Step 5 │ Step 6 │ │
39
+ Pati- │ │Clini- │ │Drug │ │Guide- │ │Conflict │ │Synthe- │
40
+ │ │ent │ │cal │ │Inter-│ │line │ │Detect- │ │sis │
41
+ Parser│ │Reason- │ │action│ │Retriev-│ │ion │ │Agent │
42
+ │(LLM) │ │ing │ │(APIs)│ │al(RAG) │ │(LLM) │ (LLM) │ │
43
+ └──────┘ │(LLM) │ └────┘ └──┬─────┘ └─────────┘ └─────────┘
44
+ └────────┘
45
  │ ┌────▼────┐ ┌─▼──────────────┐ │
46
  │ │OpenFDA │ │ChromaDB │ │
47
  │ │RxNorm │ │62 guidelines │ │
 
100
  - **Fallback:** If `clinical_guidelines.json` is missing, falls back to 2 minimal embedded guidelines
101
  - **Timing:** ~9.6 s (observed)
102
 
103
+ ### Step 5: Conflict Detection (`conflict_detection.py`)
104
+
105
+ - **Input:** Patient profile, clinical reasoning, drug interactions, and retrieved guidelines from Steps 1–4
106
+ - **Output:** `ConflictDetectionResult` with specific `ClinicalConflict` items
107
+ - **Method:** LLM-based comparison of guideline recommendations against the patient's actual data
108
+ - **Conflict types detected:**
109
+ - **Omission** — Guideline recommends something the patient is not receiving
110
+ - **Contradiction** — Patient's current treatment conflicts with guideline advice
111
+ - **Dosage** — Guideline specifies dose adjustments that apply to this patient (age, renal function, etc.)
112
+ - **Monitoring** — Guideline requires monitoring that is not documented as ordered
113
+ - **Allergy Risk** — Guideline-recommended treatment involves a medication the patient is allergic to
114
+ - **Interaction Gap** — Known drug interaction is not addressed in the care plan
115
+ - **Each conflict includes:** severity (critical/high/moderate/low), guideline source, guideline text, patient data, description, and suggested resolution
116
+ - **Temperature:** 0.1 (low, for safety-critical analysis)
117
+ - **Graceful degradation:** Returns empty result if no guidelines were retrieved (Step 4 skipped/failed)
118
+
119
+ ### Step 6: Synthesis Agent (`synthesis.py`)
120
+
121
+ - **Input:** All outputs from Steps 1–4 plus conflict detection results
122
  - **Output:** `CDSReport` (comprehensive structured report)
123
  - **Report sections:**
124
  - Patient summary
125
  - Differential diagnosis with reasoning chains
126
  - Drug interaction warnings with severity
127
+ - **Conflicts & gaps** — prominently featured with guideline vs patient data comparison
128
  - Guideline-concordant recommendations with citations
129
  - Suggested next steps (immediate, short-term, long-term)
130
+ - Caveats and limitations
131
  - **Timing:** ~25.3 s (observed)
132
 
133
+ **Total pipeline time:** ~75–85 s for a complex case (6 steps, with Steps 3–4 parallel).
134
 
135
  ---
136
 
 
165
 
166
  ## Data Models (Pydantic v2)
167
 
168
+ All pipeline data is strongly typed via Pydantic models in `schemas.py` (~280 lines):
169
 
170
  | Model | Purpose |
171
  |-------|---------|
 
177
  | `DrugInteractionResult` | Step 3 output: all interaction data |
178
  | `GuidelineExcerpt` | Individual guideline citation |
179
  | `GuidelineRetrievalResult` | Step 4 output: relevant guidelines |
180
+ | `ConflictType` | Enum: omission, contradiction, dosage, monitoring, allergy_risk, interaction_gap |
181
+ | `ClinicalConflict` | Individual conflict: guideline_text vs patient_data + suggested resolution |
182
+ | `ConflictDetectionResult` | Step 5 output: all detected conflicts |
183
+ | `CDSReport` | Step 6 output: full synthesized report (now includes conflicts) |
184
  | `AgentStep` | WebSocket message: step name, status, data, timing |
185
 
186
  ---
 
198
  | Component | Role |
199
  |-----------|------|
200
  | `PatientInput.tsx` | Text area for patient case + 3 pre-loaded sample cases (chest pain, DKA, pediatric fever) |
201
+ | `AgentPipeline.tsx` | Visualizes the 6-step pipeline in real time — shows status (pending / running / complete / error) for each step as WebSocket messages arrive |
202
+ | `CDSReport.tsx` | Renders the final CDS report: patient summary, differentials, drug warnings, **conflicts & gaps** (prominently styled), guidelines, next steps |
203
 
204
  ### Communication
205
 
 
235
 
236
  | Characteristic | Chatbot | This Agent System |
237
  |----------------|---------|-------------------|
238
+ | Tool use | None | 5+ specialized tools (parser, drug API, RAG, conflict detection, synthesis) |
239
+ | Planning | None | Orchestrator executes a defined 6-step plan |
240
  | State management | Stateless | Patient context flows through all steps |
241
  | Error handling | Generic | Tool-specific fallbacks, graceful degradation |
242
  | Output structure | Free text | Pydantic-validated, structured, cited |
docs/test_results.md CHANGED
@@ -60,7 +60,7 @@ python test_rag_quality.py --rebuild --verbose
60
  ## 2. End-to-End Pipeline Test
61
 
62
  **Test file:** `src/backend/test_e2e.py`
63
- **What it tests:** Full 5-step agent pipeline from free-text input to synthesized CDS report.
64
  **Test case:** 62-year-old male with crushing substernal chest pain, diaphoresis, nausea, HTN history, on lisinopril + metformin + atorvastatin.
65
 
66
  ### Pipeline Step Results
@@ -71,7 +71,8 @@ python test_rag_quality.py --rebuild --verbose
71
  | 2. Clinical Reasoning | PASSED | 21.2 s | Top differential: Acute Coronary Syndrome (ACS). Also considered: GERD, PE, aortic dissection |
72
  | 3. Drug Interaction Check | PASSED | 11.3 s | Queried OpenFDA + RxNorm for lisinopril, metformin, atorvastatin interactions |
73
  | 4. Guideline Retrieval | PASSED | 9.6 s | Retrieved ACC/AHA chest pain / ACS guidelines from RAG corpus |
74
- | 5. Synthesis | PASSED | 25.3 s | Generated comprehensive CDS report with differential, warnings, guideline recommendations |
 
75
 
76
  **Total pipeline time:** 75.2 s
77
 
@@ -185,7 +186,7 @@ python test_clinical_cases.py --quiet
185
 
186
  | File | Lines | Purpose |
187
  |------|-------|---------|
188
- | `test_e2e.py` | 57 | Submit chest pain case, poll for completion, validate all 5 steps |
189
  | `test_clinical_cases.py` | ~400 | 22 clinical cases with keyword validation, CLI flags for filtering |
190
  | `test_rag_quality.py` | ~350 | 30 RAG retrieval queries with expected guideline IDs, relevance scoring |
191
  | `test_poll.py` | ~30 | Utility: poll a case ID until completion |
 
60
  ## 2. End-to-End Pipeline Test
61
 
62
  **Test file:** `src/backend/test_e2e.py`
63
+ **What it tests:** Full 6-step agent pipeline from free-text input to synthesized CDS report.
64
  **Test case:** 62-year-old male with crushing substernal chest pain, diaphoresis, nausea, HTN history, on lisinopril + metformin + atorvastatin.
65
 
66
  ### Pipeline Step Results
 
71
  | 2. Clinical Reasoning | PASSED | 21.2 s | Top differential: Acute Coronary Syndrome (ACS). Also considered: GERD, PE, aortic dissection |
72
  | 3. Drug Interaction Check | PASSED | 11.3 s | Queried OpenFDA + RxNorm for lisinopril, metformin, atorvastatin interactions |
73
  | 4. Guideline Retrieval | PASSED | 9.6 s | Retrieved ACC/AHA chest pain / ACS guidelines from RAG corpus |
74
+ | 5. Conflict Detection | PASSED | | Compares guidelines against patient data for omissions, contradictions, dosage, monitoring gaps |
75
+ | 6. Synthesis | PASSED | 25.3 s | Generated comprehensive CDS report with differential, warnings, conflicts, guideline recommendations |
76
 
77
  **Total pipeline time:** 75.2 s
78
 
 
186
 
187
  | File | Lines | Purpose |
188
  |------|-------|---------|
189
+ | `test_e2e.py` | 57 | Submit chest pain case, poll for completion, validate all 6 steps |
190
  | `test_clinical_cases.py` | ~400 | 22 clinical cases with keyword validation, CLI flags for filtering |
191
  | `test_rag_quality.py` | ~350 | 30 RAG retrieval queries with expected guideline IDs, relevance scoring |
192
  | `test_poll.py` | ~30 | Utility: poll a case ID until completion |
docs/writeup_draft.md CHANGED
@@ -65,15 +65,16 @@ Gemma 3 27B IT provides the right balance of capability and accessibility for a
65
 
66
  **How the model is used:**
67
 
68
- The model serves as the reasoning engine in a 5-step agentic pipeline:
69
 
70
  1. **Patient Data Parsing** (LLM) — Extracts structured patient data from free-text clinical narratives
71
  2. **Clinical Reasoning** (LLM) — Generates ranked differential diagnoses with chain-of-thought reasoning
72
  3. **Drug Interaction Check** (External APIs) — Queries OpenFDA and RxNorm for medication safety
73
  4. **Guideline Retrieval** (RAG) — Retrieves relevant clinical guidelines from a 62-guideline corpus using ChromaDB
74
- 5. **Synthesis** (LLM) — Integrates all outputs into a comprehensive CDS report
 
75
 
76
- The model is used in Steps 1, 2, and 5 — parsing, reasoning, and synthesis. This demonstrates the model used "to its fullest potential" across multiple distinct clinical tasks within a single workflow.
77
 
78
  ### Technical details
79
 
@@ -82,12 +83,13 @@ The model is used in Steps 1, 2, and 5 — parsing, reasoning, and synthesis. Th
82
  ```
83
  Frontend (Next.js 14) ←→ Backend (FastAPI + Python 3.10)
84
 
85
- Orchestrator (5-step pipeline)
86
  ├── Step 1: Patient Parser (LLM)
87
  ├── Step 2: Clinical Reasoning (LLM)
88
  ├── Step 3: Drug Check (OpenFDA + RxNorm APIs)
89
  ├── Step 4: Guideline Retrieval (ChromaDB RAG)
90
- ── Step 5: Synthesis (LLM)
 
91
  ```
92
 
93
  All inter-step data is strongly typed with Pydantic v2 models. The pipeline streams each step's progress to the frontend via WebSocket for real-time visibility.
@@ -100,7 +102,7 @@ No fine-tuning was performed in the current version. The base `gemma-3-27b-it` m
100
 
101
  | Test | Result |
102
  |------|--------|
103
- | E2E pipeline (chest pain / ACS) | All 5 steps passed, 75 s total |
104
  | RAG retrieval quality | 30/30 queries passed (100%), avg relevance 0.639 |
105
  | Clinical test suite | 22 scenarios across 14 specialties |
106
  | Top-1 RAG accuracy | 100% — correct guideline ranked #1 for all queries |
@@ -127,10 +129,11 @@ No fine-tuning was performed in the current version. The base `gemma-3-27b-it` m
127
  In a real clinical setting, the system would be used at the point of care:
128
  1. Clinician opens the CDS Agent interface (embedded in the EHR or as a standalone app)
129
  2. Patient data is automatically pulled from the EHR (or pasted manually)
130
- 3. The agent pipeline runs in ~60-90 seconds, during which the clinician can continue other tasks
131
  4. The CDS report appears with:
132
  - Ranked differential diagnoses with reasoning chains (transparent AI)
133
  - Drug interaction warnings with severity levels
 
134
  - Relevant clinical guideline excerpts with citations to authoritative sources
135
  - Suggested next steps (immediate, short-term, long-term)
136
  5. The clinician reviews the recommendations and incorporates them into their clinical judgment
 
65
 
66
  **How the model is used:**
67
 
68
+ The model serves as the reasoning engine in a 6-step agentic pipeline:
69
 
70
  1. **Patient Data Parsing** (LLM) — Extracts structured patient data from free-text clinical narratives
71
  2. **Clinical Reasoning** (LLM) — Generates ranked differential diagnoses with chain-of-thought reasoning
72
  3. **Drug Interaction Check** (External APIs) — Queries OpenFDA and RxNorm for medication safety
73
  4. **Guideline Retrieval** (RAG) — Retrieves relevant clinical guidelines from a 62-guideline corpus using ChromaDB
74
+ 5. **Conflict Detection** (LLM) — Compares guideline recommendations against patient data to identify omissions, contradictions, dosage concerns, monitoring gaps, allergy risks, and interaction gaps
75
+ 6. **Synthesis** (LLM) — Integrates all outputs into a comprehensive CDS report with conflicts prominently featured
76
 
77
+ The model is used in Steps 1, 2, 5, and 6 — parsing, reasoning, conflict detection, and synthesis. This demonstrates the model used "to its fullest potential" across multiple distinct clinical tasks within a single workflow.
78
 
79
  ### Technical details
80
 
 
83
  ```
84
  Frontend (Next.js 14) ←→ Backend (FastAPI + Python 3.10)
85
 
86
+ Orchestrator (6-step pipeline)
87
  ├── Step 1: Patient Parser (LLM)
88
  ├── Step 2: Clinical Reasoning (LLM)
89
  ├── Step 3: Drug Check (OpenFDA + RxNorm APIs)
90
  ├── Step 4: Guideline Retrieval (ChromaDB RAG)
91
+ ── Step 5: Conflict Detection (LLM)
92
+ └── Step 6: Synthesis (LLM)
93
  ```
94
 
95
  All inter-step data is strongly typed with Pydantic v2 models. The pipeline streams each step's progress to the frontend via WebSocket for real-time visibility.
 
102
 
103
  | Test | Result |
104
  |------|--------|
105
+ | E2E pipeline (chest pain / ACS) | All 6 steps passed, ~75–85 s total |
106
  | RAG retrieval quality | 30/30 queries passed (100%), avg relevance 0.639 |
107
  | Clinical test suite | 22 scenarios across 14 specialties |
108
  | Top-1 RAG accuracy | 100% — correct guideline ranked #1 for all queries |
 
129
  In a real clinical setting, the system would be used at the point of care:
130
  1. Clinician opens the CDS Agent interface (embedded in the EHR or as a standalone app)
131
  2. Patient data is automatically pulled from the EHR (or pasted manually)
132
+ 3. The agent pipeline runs in ~6090 seconds, during which the clinician can continue other tasks
133
  4. The CDS report appears with:
134
  - Ranked differential diagnoses with reasoning chains (transparent AI)
135
  - Drug interaction warnings with severity levels
136
+ - **Conflicts & gaps** between guideline recommendations and the patient's actual data — prominently displayed with specific guideline citations, patient data comparisons, and suggested resolutions
137
  - Relevant clinical guideline excerpts with citations to authoritative sources
138
  - Suggested next steps (immediate, short-term, long-term)
139
  5. The clinician reviews the recommendations and incorporates them into their clinical judgment