cgoodmaker Claude Opus 4.6 commited on
Commit
5241b71
Β·
1 Parent(s): 82f82ac

Add RAG Phase 4 management guidance, rebuild guidelines index (286 chunks), post-analysis hint UI

Browse files

- Auto-trigger guidelines RAG after analysis: search clinical guidelines for diagnosed condition, MedGemma synthesises management recommendations with inline citations
- Rebuild FAISS index with 7 PDFs (BCC, SCC, melanoma, actinic keratosis, contact dermatitis, lichen sclerosus, warts)
- Add search_guidelines label to ToolCallCard
- Add post-analysis hint text prompting follow-up questions or comparison uploads
- Add Kaggle challenge technical report

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

KAGGLE_REPORT.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SkinProAI: Explainable Multi-Model Dermatology Decision Support
2
+
3
+ ## 1. System Architecture & AI Pipeline
4
+
5
+ SkinProAI is a clinician-first dermatology decision-support system that orchestrates multiple specialised AI models through a four-phase analysis pipeline. The system is designed to run entirely on-device using Google's MedGemma 4B β€” a 4-billion-parameter vision-language model purpose-built for edge deployment in clinical settings β€” executing on consumer hardware (Apple M4 Pro / MPS) in bfloat16 precision without requiring cloud GPU infrastructure.
6
+
7
+ ### 1.1 Multi-Model Orchestration via MCP
8
+
9
+ Rather than relying on a single monolithic model, SkinProAI employs a Model Context Protocol (MCP) architecture that isolates each AI capability as an independent tool callable via JSON-RPC over subprocess stdio. This design provides fault isolation, independent model loading, and a clear audit trail of which model produced which output.
10
+
11
+ | Tool | Model | Purpose |
12
+ |------|-------|---------|
13
+ | `monet_analyze` | MONET (contrastive vision-language) | Extract 7 dermoscopic concept scores: ulceration, hair structures, vascular patterns, erythema, pigmented structures, gel artefacts, skin markings |
14
+ | `classify_lesion` | ConvNeXt Base + metadata MLP | 11-class lesion classification using dual-encoder architecture combining image features with MONET concept scores |
15
+ | `generate_gradcam` | Grad-CAM on ConvNeXt | Generate visual attention heatmap overlays highlighting regions driving the classification decision |
16
+ | `search_guidelines` | FAISS + all-MiniLM-L6-v2 | Retrieve relevant clinical guideline passages from a RAG index of 286 chunks across 7 dermatology PDFs |
17
+ | `compare_images` | MONET + overlay | Temporal change detection comparing dermoscopic feature vectors between sequential images |
18
+
19
+ All tools receive absolute file paths as input (never raw image data), enabling secure subprocess isolation between the main FastAPI process (Python 3.9) and the MCP server (Python 3.11).
20
+
21
+ ### 1.2 Four-Phase Analysis Pipeline
22
+
23
+ Every image analysis follows a structured, transparent pipeline:
24
+
25
+ **Phase 1 β€” Independent Visual Examination.** MedGemma 4B performs a systematic dermoscopic assessment *before* seeing any AI tool output. It evaluates pattern architecture, colour distribution, border characteristics, and structural features, producing differential diagnoses ranked by clinical probability. This deliberate sequencing prevents anchoring bias β€” the language model forms its own clinical impression independently.
26
+
27
+ **Phase 2 β€” AI Classification Tools.** Three MCP tools execute in sequence: MONET extracts quantitative concept scores (0–1 range for each dermoscopic feature), ConvNeXt classifies the lesion into one of 11 diagnostic categories with calibrated probabilities, and Grad-CAM generates an attention overlay showing which image regions drove the classification. Each tool's output is streamed to the clinician in real-time with visual bar charts.
28
+
29
+ **Phase 3 β€” Reconciliation.** MedGemma receives both its own Phase 1 assessment and the Phase 2 tool outputs, then performs explicit agreement/disagreement analysis. It identifies where its visual findings align with or diverge from the quantitative classifiers, produces an integrated assessment with a stated confidence level, and explains its reasoning. This adversarial cross-check between independent assessments is central to the system's reliability.
30
+
31
+ **Phase 4 β€” Management Guidance with RAG.** The system automatically queries a FAISS-indexed knowledge base of clinical dermatology guidelines (BAD, NICE, and specialist PDFs covering BCC, SCC, melanoma, actinic keratosis, contact dermatitis, lichen sclerosus, and cutaneous warts). Using the diagnosed condition as a search query, the RAG system retrieves the top-5 most relevant guideline passages via cosine similarity over sentence-transformer embeddings (all-MiniLM-L6-v2, 384 dimensions). MedGemma then synthesises lesion-specific management recommendations β€” biopsy, excision, monitoring, or discharge β€” grounded in the retrieved evidence, with inline superscript citations linking back to source documents and page numbers.
32
+
33
+ ---
34
+
35
+ ## 2. AI Explainability
36
+
37
+ Explainability is not an afterthought in SkinProAI β€” it is embedded at every layer of the architecture. The system provides three complementary forms of explanation: visual, quantitative, and narrative.
38
+
39
+ ### 2.1 Visual Explainability: Grad-CAM Attention Maps
40
+
41
+ Grad-CAM (Gradient-weighted Class Activation Mapping) hooks into the final convolutional layer of the ConvNeXt classifier to produce a spatial heatmap overlay on the original dermoscopic image. This directly answers the question "where is the model looking?" β€” showing clinicians which morphological features (border irregularity, colour variegation, structural asymmetry) are driving the classification. For temporal comparisons, the system generates side-by-side Grad-CAM pairs (previous vs. current) so clinicians can visually assess whether attention regions have shifted, expanded, or resolved.
42
+
43
+ ### 2.2 Quantitative Explainability: MONET Concept Scores
44
+
45
+ MONET provides human-interpretable concept decomposition by scoring seven clinically meaningful dermoscopic features on a continuous 0–1 scale. Unlike black-box classifiers that output only a label and probability, MONET's concept scores reveal *why* a lesion received its classification: a high vascular score combined with low pigmentation explains a vascular lesion diagnosis; high pigmented-structure scores with border irregularity support melanocytic concern. These scores are rendered as visual bar charts in the streaming UI, giving clinicians immediate quantitative insight into the feature profile driving the AI's assessment.
46
+
47
+ When comparing sequential images over time, the system computes MONET feature deltas β€” the signed difference in each concept score between timepoints β€” enabling objective quantification of lesion evolution. A change from 0.3 to 0.7 in the vascular score, for example, signals new vessel formation that warrants clinical attention, independent of any subjective visual comparison.
48
+
49
+ ### 2.3 Narrative Explainability: MedGemma Reasoning Transparency
50
+
51
+ The streaming interface exposes MedGemma's reasoning process through structured markup segments. `[THINKING]` blocks display the model's intermediate reasoning (differential construction, feature weighting, agreement analysis) in real-time, with animated spinners that resolve to completion indicators as each reasoning phase finishes. `[RESPONSE]` blocks contain the synthesised clinical narrative. This staged transparency allows clinicians to follow the AI's analytical process rather than receiving only a final pronouncement.
52
+
53
+ The reconciliation phase (Phase 3) is particularly significant for explainability: MedGemma explicitly states where its independent visual assessment agrees or disagrees with the quantitative classifiers, and explains the basis for its integrated conclusion. This adversarial structure makes disagreements visible and auditable.
54
+
55
+ ### 2.4 Evidence Grounding: RAG Citations
56
+
57
+ Management recommendations in Phase 4 include inline superscript references (e.g., "Wide local excision with 2cm margins is recommended for tumours >2mm Breslow thicknessΒΉ") linked to specific guideline documents and page numbers. This evidence chain β€” from diagnosis through to management recommendation β€” is fully traceable to published clinical guidelines, supporting regulatory compliance and clinical governance requirements.
58
+
59
+ ---
60
+
61
+ ## 3. Clinician-First Design & User Interface
62
+
63
+ ### 3.1 Design Philosophy
64
+
65
+ SkinProAI is built around the principle that AI should augment clinical decision-making, not replace it. The interface is designed for the workflow of a clinician reviewing dermoscopic images: patient selection, lesion documentation, temporal tracking, and structured analysis with clear next-step guidance.
66
+
67
+ The system deliberately avoids presenting AI conclusions as definitive diagnoses. Instead, results are framed as ranked differentials with calibrated confidence scores, supported by quantitative feature evidence and visual attention maps. The clinician retains full agency over the diagnostic and management decision.
68
+
69
+ ### 3.2 Streaming Real-Time Interface
70
+
71
+ The UI employs Server-Sent Events (SSE) to stream analysis output in real-time, maintaining clinician engagement during model inference. Rather than a loading spinner followed by a wall of text, clinicians observe the analysis unfolding phase by phase:
72
+
73
+ - **Tool status lines** appear as compact single-line indicators with animated spinners that resolve to green completion dots, showing tool name and summary result (e.g., "ConvNeXt classification β€” Melanoma (89%)")
74
+ - **Thinking indicators** display MedGemma's reasoning steps with spinners that transition to done states as each phase completes
75
+ - **Streamed text** appears word-by-word, allowing clinicians to begin reading findings before analysis completes
76
+
77
+ This streaming pattern reduces perceived latency and provides transparency into the multi-model pipeline's progress.
78
+
79
+ ### 3.3 Temporal Lesion Tracking
80
+
81
+ The data model supports longitudinal monitoring through a Patient β†’ Lesion β†’ LesionImage hierarchy. Each lesion maintains a timeline of images with timestamps, and the system automatically triggers temporal comparison when a new image is uploaded for a previously-analysed lesion. Comparison results include MONET feature deltas, side-by-side Grad-CAM overlays, and a status classification (Stable / Minor Change / Significant Change / Improved) rendered with colour-coded indicators.
82
+
83
+ This temporal architecture supports the clinical workflow for monitoring suspicious lesions over time β€” a common scenario in dermatology where serial dermoscopy is preferred over immediate biopsy for lesions with intermediate risk profiles.
84
+
85
+ ### 3.4 Conversational Follow-Up
86
+
87
+ After analysis completes, the system transitions to a conversational interface where clinicians can ask follow-up questions grounded in the analysis context. The chat maintains full awareness of the diagnosed condition, MONET features, classification results, and guideline context, enabling queries like "what are the excision margin recommendations for this depth?" or "how does the asymmetry score compare to the previous image?" The text-only chat pathway routes through MedGemma with the full analysis state, ensuring responses remain clinically contextualised.
88
+
89
+ ### 3.5 Edge Deployment Architecture
90
+
91
+ SkinProAI runs entirely on-device using MedGemma 4B in bfloat16 precision on consumer hardware (demonstrated on Apple M4 Pro with 24GB RAM using Metal Performance Shaders). No patient data leaves the device β€” all inference, image storage, and guideline retrieval execute locally. This edge-first architecture addresses data sovereignty requirements in clinical settings where patient images cannot be transmitted to cloud services. The system also supports containerised deployment via Docker for institutional environments, with optional GPU acceleration on CUDA-equipped hardware or HuggingFace Spaces.
92
+
93
+ ### 3.6 Interface Components
94
+
95
+ The frontend is built in React 18 with a glassmorphism-inspired design language. Key interface elements include:
96
+
97
+ - **Patient grid** β€” Card-based patient selection with creation workflow
98
+ - **Chat interface** β€” Unified conversation view combining image analysis, tool outputs, and text chat in a single scrollable thread
99
+ - **Tool call cards** β€” Inline status indicators for each AI tool invocation (analyse, classify, Grad-CAM, guidelines search, compare) with expandable result summaries
100
+ - **Image upload** β€” Drag-and-drop or click-to-upload with preview, supporting both initial analysis and temporal comparison workflows
101
+ - **Post-analysis prompt** β€” Contextual hint guiding clinicians to ask follow-up questions, provide additional context, or upload comparison images
102
+ - **Markdown rendering** β€” Clinical narratives rendered with proper formatting, headers, lists, and inline citations for readability
103
+
104
+ The interface prioritises information density without clutter: tool outputs collapse to single-line summaries by default, reasoning phases are visually distinguished from conclusions, and temporal comparisons use colour-coded status indicators (green/amber/red/blue) that map to clinical urgency levels.
guidelines/index/chunks.json CHANGED
The diff for this file is too large to render. See raw diff
 
guidelines/index/faiss.index CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:faf9cd914f52b84a55d486a156b4756f28dbc1a92abeafc121077402e1fa53f4
3
- size 145965
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c20c3401169901e7d146640c02c0051c62d86668b7cc3d7278055df037add67a
3
+ size 439341
models/medgemma_agent.py CHANGED
@@ -517,10 +517,10 @@ Be concise and specific."""
517
  ):
518
  yield chunk
519
 
520
- # Yield confirmation request
521
- if convnext_result:
522
- top = convnext_result['predictions'][0]
523
- yield f"\n[CONFIRM:diagnosis]Do you agree with the integrated assessment?[/CONFIRM]\n"
524
 
525
  def generate_management_guidance(
526
  self,
 
517
  ):
518
  yield chunk
519
 
520
+ # ===== PHASE 4: Auto Management Guidance (with RAG) =====
521
+ if convnext_result and self.mcp_client:
522
+ for chunk in self.generate_management_guidance():
523
+ yield chunk
524
 
525
  def generate_management_guidance(
526
  self,
web/src/components/ToolCallCard.tsx CHANGED
@@ -13,6 +13,7 @@ const TOOL_LABELS: Record<string, string> = {
13
  classify: 'Running classifier',
14
  gradcam: 'Generating attention map',
15
  guidelines: 'Searching clinical guidelines',
 
16
  };
17
 
18
  const STATUS_CONFIG: Record<string, { label: string; dot: string }> = {
 
13
  classify: 'Running classifier',
14
  gradcam: 'Generating attention map',
15
  guidelines: 'Searching clinical guidelines',
16
+ search_guidelines: 'Searching clinical guidelines',
17
  };
18
 
19
  const STATUS_CONFIG: Record<string, { label: string; dot: string }> = {
web/src/pages/ChatPage.css CHANGED
@@ -219,6 +219,15 @@
219
  30% { transform: translateY(-6px); }
220
  }
221
 
 
 
 
 
 
 
 
 
 
222
  /* ─── Input bar ──────────────────────────────────────────────────────────── */
223
  .chat-input-bar {
224
  background: rgba(255, 255, 255, 0.8);
 
219
  30% { transform: translateY(-6px); }
220
  }
221
 
222
+ /* ─── Post-analysis hint ─────────────────────────────────────────────────── */
223
+ .post-analysis-hint {
224
+ font-size: 0.8125rem;
225
+ color: #999999;
226
+ font-style: italic;
227
+ padding: 6px 0;
228
+ margin: 0;
229
+ }
230
+
231
  /* ─── Input bar ──────────────────────────────────────────────────────────── */
232
  .chat-input-bar {
233
  background: rgba(255, 255, 255, 0.8);
web/src/pages/ChatPage.tsx CHANGED
@@ -220,6 +220,19 @@ export function ChatPage() {
220
  </div>
221
  ))}
222
 
 
 
 
 
 
 
 
 
 
 
 
 
 
223
  <div ref={messagesEndRef} />
224
  </main>
225
 
 
220
  </div>
221
  ))}
222
 
223
+ {/* Hint after analysis completes */}
224
+ {!isStreaming && messages.length > 0 && (() => {
225
+ const last = messages[messages.length - 1];
226
+ const hasAnalysis = last.role === 'assistant' &&
227
+ (last.tool_calls ?? []).some(tc => tc.tool === 'analyze_image');
228
+ if (!hasAnalysis) return null;
229
+ return (
230
+ <p className="post-analysis-hint">
231
+ Ask a follow-up question, add context, or upload an image to compare to previous.
232
+ </p>
233
+ );
234
+ })()}
235
+
236
  <div ref={messagesEndRef} />
237
  </main>
238