yashvshetty commited on
Commit
9636a02
·
1 Parent(s): c39d1ca

Clarke: NHS clinical documentation system

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .env.template +35 -0
  2. .gitattributes +1 -0
  3. .gitignore +10 -0
  4. Additional Competition Context.md +339 -0
  5. Clarke_Product_Specification_V2.md +610 -0
  6. Dockerfile +19 -0
  7. LICENSE +201 -0
  8. MedGemma High-Level Context.md +174 -0
  9. README.md +257 -2
  10. app.py +33 -0
  11. backend/.DS_Store +0 -0
  12. backend/__init__.py +1 -0
  13. backend/api.py +490 -0
  14. backend/audio.py +85 -0
  15. backend/config.py +66 -0
  16. backend/errors.py +94 -0
  17. backend/fhir/__init__.py +1 -0
  18. backend/fhir/client.py +194 -0
  19. backend/fhir/mock_api.py +412 -0
  20. backend/fhir/queries.py +62 -0
  21. backend/fhir/tools.py +120 -0
  22. backend/models/__init__.py +1 -0
  23. backend/models/doc_generator.py +342 -0
  24. backend/models/ehr_agent.py +459 -0
  25. backend/models/medasr.py +180 -0
  26. backend/models/model_manager.py +91 -0
  27. backend/orchestrator.py +496 -0
  28. backend/prompts/context_synthesis.j2 +38 -0
  29. backend/prompts/document_generation.j2 +60 -0
  30. backend/prompts/ehr_agent_system.txt +25 -0
  31. backend/schemas.py +149 -0
  32. backend/utils.py +78 -0
  33. clarke/.env.template +1 -0
  34. clarke/Dockerfile +1 -0
  35. clarke/LICENSE +1 -0
  36. clarke/README.md +1 -0
  37. clarke/app.py +1 -0
  38. clarke/backend/__init__.py +1 -0
  39. clarke/backend/api.py +1 -0
  40. clarke/backend/audio.py +1 -0
  41. clarke/backend/config.py +1 -0
  42. clarke/backend/errors.py +1 -0
  43. clarke/backend/fhir/__init__.py +1 -0
  44. clarke/backend/fhir/client.py +1 -0
  45. clarke/backend/fhir/mock_api.py +1 -0
  46. clarke/backend/fhir/queries.py +1 -0
  47. clarke/backend/fhir/tools.py +1 -0
  48. clarke/backend/models/__init__.py +1 -0
  49. clarke/backend/models/doc_generator.py +1 -0
  50. clarke/backend/models/ehr_agent.py +1 -0
.env.template ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # .env.template — copy to .env and fill values
2
+
3
+ # === Model Configuration ===
4
+ MEDASR_MODEL_ID=google/medasr
5
+ MEDGEMMA_4B_MODEL_ID=google/medgemma-1.5-4b-it
6
+ MEDGEMMA_27B_MODEL_ID=google/medgemma-27b-text-it
7
+ HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx
8
+ QUANTIZE_4BIT=true
9
+ USE_FLASH_ATTENTION=true
10
+
11
+ # === FHIR Configuration ===
12
+ FHIR_SERVER_URL=http://localhost:8080/fhir
13
+ USE_MOCK_FHIR=true
14
+ FHIR_TIMEOUT_S=10
15
+
16
+ # === Application Configuration ===
17
+ APP_HOST=0.0.0.0
18
+ APP_PORT=7860
19
+ LOG_LEVEL=INFO
20
+ MAX_AUDIO_DURATION_S=1800
21
+ PIPELINE_TIMEOUT_S=120
22
+ DOC_GEN_MAX_TOKENS=2048
23
+ DOC_GEN_TEMPERATURE=0.3
24
+
25
+ # === Fine-tuning (optional, Phase 4) ===
26
+ WANDB_API_KEY=
27
+ WANDB_PROJECT=clarke-finetuning
28
+ LORA_RANK=16
29
+ LORA_ALPHA=32
30
+ LORA_DROPOUT=0.05
31
+ TRAINING_EPOCHS=3
32
+ LEARNING_RATE=2e-4
33
+ BATCH_SIZE=2
34
+ GRAD_ACCUM_STEPS=8
35
+ MAX_SEQ_LENGTH=4096
.gitattributes CHANGED
@@ -1 +1,2 @@
1
  *.wav filter=lfs diff=lfs merge=lfs -text
 
 
1
  *.wav filter=lfs diff=lfs merge=lfs -text
2
+ *.aiff filter=lfs diff=lfs merge=lfs -text
.gitignore CHANGED
@@ -205,3 +205,13 @@ cython_debug/
205
  marimo/_static/
206
  marimo/_lsp/
207
  __marimo__/
 
 
 
 
 
 
 
 
 
 
 
205
  marimo/_static/
206
  marimo/_lsp/
207
  __marimo__/
208
+
209
+ # Clarke runtime artifacts
210
+ logs/
211
+ models/
212
+ *.pt
213
+ *.bin
214
+ *.safetensors
215
+ *.ckpt
216
+ *.wav
217
+ !data/demo/*.wav
Additional Competition Context.md ADDED
@@ -0,0 +1,339 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # **Additional Competition Context — MedGemma Impact Challenge**
2
+
3
+ *Compiled 2026-02-12 from systematic review of Kaggle competition pages, discussions, rules, and HAI-DEF developer documentation.*
4
+
5
+ ### **Key NEW findings not in Clarke spec:**
6
+
7
+ 1. **Submission is a Kaggle Writeup (not a file upload)** — created via the Writeups tab, with a Submit button. One submission per team, can re-submit.
8
+ 2. **3 pages \= \~1,500 words** (host clarification). With images, aim for 1,000-1,200 words.
9
+ 3. **12 named judges** — all Google Research/Health AI staff. Profiles documented.
10
+ 4. **Judged as DEMO, not production** — Yun Liu explicitly said regulatory pathway is NOT the focus. Focus is technical demonstration.
11
+ 5. **Non-commercial training data is OK** — Daniel Golden confirmed. Model weights release is BONUS not required.
12
+ 6. **MedGemma 4B has known instruction-following bugs** — leaks system prompts, generates meta-commentary. Risk for Clarke's EHR pipeline.
13
+ 7. **MedGemma 27B deployment is very hard** — needs \~54GB VRAM, Vertex AI A100 quotas being rejected. Unsloth GGUF quantizations available via Ollama (Q8\_0 \= 31.8GB).
14
+ 8. **MedGemma 1.5 4B adds EHR understanding** — directly relevant to Clarke. Should use 1.5 not 1.0.
15
+ 9. **MedASR runs in-browser via ONNX/WebGPU** — someone built it, could strengthen Edge AI track claim.
16
+ 10. **Only 129 submissions from 5,855 entrants** — competitive field may be smaller than expected.
17
+ 11. **Google explicitly suggests agentic orchestration** in their MedGemma docs — validates Clarke's architecture.
18
+
19
+ ---
20
+
21
+ ## **1\. Submission Requirements**
22
+
23
+ ### **Submission Format: Kaggle Writeup (NOT a file upload)**
24
+
25
+ * Your submission is a **Kaggle Writeup** attached to the competition's Writeups page — not a PDF or file upload. Create via the "New Writeup" button at: [https://www.kaggle.com/competitions/med-gemma-impact-challenge/writeups](https://www.kaggle.com/competitions/med-gemma-impact-challenge/writeups)
26
+ * After saving your Writeup, click the **"Submit"** button in the top right corner.
27
+ * Each team gets **one (1) Writeup submission only**, but it can be un-submitted, edited, and re-submitted unlimited times before the deadline.
28
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview](https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview) (Submission Instructions section)
29
+
30
+ ### **3-Page Limit Clarification**
31
+
32
+ * The "3 pages" translates to **\~1,500 words** single-spaced. If using charts/images/code blocks, aim for **1,000–1,200 words** of text.
33
+ * **Source:** Fereshteh Mahvar (Competition Host) at [https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/671156](https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/671156)
34
+
35
+ ### **Required Links in Writeup**
36
+
37
+ * **Required:** Video (3 min or less)
38
+ * **Required:** Public code repository
39
+ * **Bonus:** Public interactive live demo app
40
+ * **Bonus:** Open-weight Hugging Face model tracing to a HAI-DEF model
41
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview](https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview) (Submission Instructions)
42
+
43
+ ### **Track Selection**
44
+
45
+ * All submissions automatically compete in the **Main Track**.
46
+ * You may select **one** special award prize (Agentic Workflow, Novel Task, or Edge AI).
47
+ * If you select multiple special awards, only one will be considered (randomly selected).
48
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview](https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview) (Choosing a Track)
49
+
50
+ ### **Writeup Template (exact structure required)**
51
+
52
+ \#\#\# Project name
53
+ \#\#\# Your team \[Name members, speciality, role\]
54
+ \#\#\# Problem statement \[Problem domain \+ Impact potential criteria\]
55
+ \#\#\# Overall solution \[Effective use of HAI-DEF models criterion\]
56
+ \#\#\# Technical details \[Product feasibility criterion\]
57
+
58
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview](https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview) (Proposed Writeup template)
59
+
60
+ ### **Private Resources Warning**
61
+
62
+ * If you attach a **private** Kaggle Resource to your public Writeup, it will be **automatically made public** after the deadline.
63
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview](https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview)
64
+
65
+ ### **Submissions Must Be in English**
66
+
67
+ * Confirmed by María Cruz (Kaggle Staff).
68
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/667660](https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/667660)
69
+
70
+ ---
71
+
72
+ ## **2\. Judging Details**
73
+
74
+ ### **Full Judge Panel (12 judges)**
75
+
76
+ | Judge | Role |
77
+ | ----- | ----- |
78
+ | Fereshteh Mahvar | Staff Medical Software Engineer & Solutions Architect, Google Health AI |
79
+ | Omar Sanseviero | Developer Experience Lead, Google DeepMind |
80
+ | Glenn Cameron | Sr. PMM, Google |
81
+ | Can "John" Kirmizi | Software Engineer, Google Research |
82
+ | Andrew Sellergren | Software Engineer, Google Research |
83
+ | Dave Steiner | Clinical Research Scientist, Google |
84
+ | Sunny Virmani | Group Product Manager, Google Research |
85
+ | Liron Yatziv | Research Engineer, Google Research |
86
+ | Daniel Golden | Engineering Manager, Google Research |
87
+ | Yun Liu | Research Scientist, Google Research |
88
+ | Rebecca Hemenway | Health AI Strategic Partnerships, Google Research |
89
+ | Fayaz Jamil | Technical Program Manager, Google Research |
90
+
91
+ **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview](https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview) (Judges section)
92
+
93
+ ### **Evaluation is Through Demonstration Application Lens**
94
+
95
+ * Judges evaluate through the lens of a **demonstration application, NOT a finished product**.
96
+ * Regulatory pathway, HIPAA/GDPR compliance, etc. are **not the focus** of evaluation criteria — though you may include them.
97
+ * Quote from Yun Liu (Competition Host): *"The focus of the evaluation criteria is the technical aspects of the demonstration application. Each evaluation criteria will be judged through the lens of a demonstration application and not a finished product."*
98
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/668280](https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/668280)
99
+
100
+ ### **Execution & Communication is the Highest-Weighted Category (30%)**
101
+
102
+ * Judges look for a **"cohesive and compelling narrative across all submitted materials"** that articulates how you meet the rest of the criteria.
103
+ * Assess: clarity/polish/effectiveness of video demo, completeness/readability of writeup, quality of source code (organization, comments, reusability).
104
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview](https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview) (Evaluation Criteria table)
105
+
106
+ ---
107
+
108
+ ## **3\. Rules & Restrictions**
109
+
110
+ ### **Team Size**
111
+
112
+ * Maximum **5 members** per team.
113
+ * Team mergers allowed before Team Merger Deadline.
114
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/rules](https://www.kaggle.com/competitions/med-gemma-impact-challenge/rules) (Section 2.1)
115
+
116
+ ### **One Submission Per Team**
117
+
118
+ * For Hackathons, each team is allowed **one (1) Submission**.
119
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/rules](https://www.kaggle.com/competitions/med-gemma-impact-challenge/rules) (Section 2.2)
120
+
121
+ ### **Winner License: CC BY 4.0**
122
+
123
+ * Winning submissions must be licensed under **CC BY 4.0** (code and demos).
124
+ * For generally commercially available software you used but don't own, you don't need to grant that license.
125
+ * For input data or pretrained models with incompatible licenses used to generate your winning solution, you **don't need to grant** open source license for that data/model.
126
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/rules](https://www.kaggle.com/competitions/med-gemma-impact-challenge/rules) (Section 2.5)
127
+
128
+ ### **Non-Commercial Data is Allowed for Training**
129
+
130
+ * Using public, research-only / non-commercial external datasets during development is **permitted**.
131
+ * Participation in a Kaggle challenge is **not considered commercial use**.
132
+ * Releasing final model weights is a **bonus, not a requirement**.
133
+ * Daniel Golden (Competition Host) quote: *"You are permitted to use data and other code sources during development that are governed under other, potentially more restrictive licenses."*
134
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/671596](https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/671596)
135
+
136
+ ### **External Data & Tools**
137
+
138
+ * External data allowed if **publicly available and equally accessible** to all participants at no cost, or meets the "Reasonableness Standard".
139
+ * Use of HAI-DEF and MedGemma subject to **HAI-DEF Terms of Use**: [https://developers.google.com/health-ai-developer-foundations/terms](https://developers.google.com/health-ai-developer-foundations/terms)
140
+ * Automated ML tools (AutoML, H2O, etc.) are permitted with appropriate licensing.
141
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/rules](https://www.kaggle.com/competitions/med-gemma-impact-challenge/rules) (Section 2.6)
142
+
143
+ ### **HAI-DEF Terms Key Restrictions**
144
+
145
+ * **"Clinical Use"** (diagnosis/treatment of patients) requires Health Regulatory Authorization — this doesn't apply to a demo/competition context.
146
+ * HAI-DEF source code licensed under **Apache 2.0**.
147
+ * Models are free for research and commercial use.
148
+ * **Source:** [https://developers.google.com/health-ai-developer-foundations/terms](https://developers.google.com/health-ai-developer-foundations/terms)
149
+
150
+ ### **Mandatory HAI-DEF Model Usage**
151
+
152
+ * Use of **at least one HAI-DEF model** (such as MedGemma) is **mandatory**.
153
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview](https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview) (Effective use of HAI-DEF models criterion)
154
+
155
+ ### **Eligibility**
156
+
157
+ * Must be 18+, registered on Kaggle, not resident of sanctioned countries.
158
+ * Competition Entities (Google, Kaggle employees) can participate but **cannot win prizes**.
159
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/rules](https://www.kaggle.com/competitions/med-gemma-impact-challenge/rules) (Section 2.7, 3.1)
160
+
161
+ ### **Winner's Obligations**
162
+
163
+ * Deliver final model's software code \+ documentation.
164
+ * Code must be capable of generating the winning submission.
165
+ * Must describe resources required to build/run.
166
+ * For hackathons, deliverables are as described on the competition website (may not be software code).
167
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/rules](https://www.kaggle.com/competitions/med-gemma-impact-challenge/rules) (Section 2.8)
168
+
169
+ ### **No Cloud Credits Provided**
170
+
171
+ * Multiple competitors asked about GCP credits / Colab compute — no response from organizers confirming any credits.
172
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/667660](https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/667660)
173
+
174
+ ---
175
+
176
+ ## **4\. Winning Patterns & Insights from Discussions**
177
+
178
+ ### **Focus on Demonstration, Not Production**
179
+
180
+ * The organizers repeatedly emphasize this is about **demonstration applications** with impact potential, not production-ready systems. Keep the writeup high-level; use the video to convey concepts.
181
+ * *"Less is more\! You should take advantage of the video to convey most of the concepts and keep the write-up as high level as possible."*
182
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview](https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview) (Submission Instructions)
183
+
184
+ ### **Storytelling is Explicitly Scored**
185
+
186
+ * Problem Domain criterion explicitly mentions **"storytelling"** alongside clarity of problem definition.
187
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview](https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview) (Evaluation Criteria)
188
+
189
+ ### **Use HAI-DEF Models "to Their Fullest Potential"**
190
+
191
+ * The criterion asks whether your application uses HAI-DEF models *"to their fullest potential, where other solutions would likely be less effective"*.
192
+ * Clarke's multi-model approach (MedASR \+ MedGemma 4B \+ MedGemma 27B) is well-aligned with this.
193
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview](https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview) (Evaluation Criteria)
194
+
195
+ ### **Agentic Orchestration is Officially Suggested**
196
+
197
+ * Google's own MedGemma documentation explicitly suggests agentic orchestration: using MedGemma as a tool within an agentic system, coupled with FHIR generators/interpreters, Gemini Live for bidirectional audio, or Gemini 2.5 Pro for function calling.
198
+ * MedGemma can **"parse private health data locally before sending anonymized requests to centralized models"** — directly supports Clarke's privacy-preserving architecture.
199
+ * **Source:** [https://developers.google.com/health-ai-developer-foundations/medgemma](https://developers.google.com/health-ai-developer-foundations/medgemma)
200
+
201
+ ### **Competition Scale: Low Submissions So Far**
202
+
203
+ * 5,855 entrants but only **129 submissions** (134 participants, 129 teams) as of Feb 12 2026\.
204
+ * This suggests many entrants haven't submitted yet — the field may be smaller than expected.
205
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview](https://www.kaggle.com/competitions/med-gemma-impact-challenge/overview) (Participation stats)
206
+
207
+ ### **MedGemma 1.5 is the Latest Version**
208
+
209
+ * MedGemma 1.5 4B (google/medgemma-1.5-4b-it) was released in Jan 2026 with improved capabilities:
210
+ * High-dimensional medical imaging (CT, MRI, histopathology)
211
+ * Longitudinal medical imaging (chest X-ray time series)
212
+ * Medical document understanding (structured extraction from lab reports)
213
+ * EHR understanding (interpretation of text-based EHR data)
214
+ * Improved medical text reasoning accuracy
215
+ * **Source:** [https://research.google/blog/next-generation-medical-image-interpretation-with-medgemma-15-and-medical-speech-to-text-with-medasr/](https://research.google/blog/next-generation-medical-image-interpretation-with-medgemma-15-and-medical-speech-to-text-with-medasr/)
216
+
217
+ ---
218
+
219
+ ## **5\. Technical Resources**
220
+
221
+ ### **Official Notebooks & Code**
222
+
223
+ | Resource | URL |
224
+ | ----- | ----- |
225
+ | Quick start (Hugging Face) | [https://github.com/google-health/medgemma/blob/main/notebooks/quick\_start\_with\_hugging\_face.ipynb](https://github.com/google-health/medgemma/blob/main/notebooks/quick_start_with_hugging_face.ipynb) |
226
+ | Quick start (Model Garden) | [https://github.com/google-health/medgemma/blob/main/notebooks/quick\_start\_with\_model\_garden.ipynb](https://github.com/google-health/medgemma/blob/main/notebooks/quick_start_with_model_garden.ipynb) |
227
+ | Fine-tuning with LoRA | [https://github.com/google-health/medgemma/blob/main/notebooks/fine\_tune\_with\_hugging\_face.ipynb](https://github.com/google-health/medgemma/blob/main/notebooks/fine_tune_with_hugging_face.ipynb) |
228
+ | Reinforcement Learning | [https://github.com/Google-Health/medgemma/blob/main/notebooks/reinforcement\_learning\_with\_hugging\_face.ipynb](https://github.com/Google-Health/medgemma/blob/main/notebooks/reinforcement_learning_with_hugging_face.ipynb) |
229
+ | MedGemma GitHub repo | [https://github.com/google-health/medgemma](https://github.com/google-health/medgemma) |
230
+ | HAI-DEF developer forum | [https://discuss.ai.google.dev/c/hai-def/](https://discuss.ai.google.dev/c/hai-def/) |
231
+
232
+ ### **MedASR Resources**
233
+
234
+ | Resource | URL |
235
+ | ----- | ----- |
236
+ | MedASR model (HuggingFace) | [https://huggingface.co/google/medasr](https://huggingface.co/google/medasr) |
237
+ | MedASR developer docs | [https://developers.google.com/health-ai-developer-foundations/medasr/](https://developers.google.com/health-ai-developer-foundations/medasr/) |
238
+ | MedASR in-browser (ONNX/WebGPU) | [https://medasr.ainergiz.com/](https://medasr.ainergiz.com/) |
239
+ | MedASR in-browser source | [https://github.com/ainergiz/medasr-web](https://github.com/ainergiz/medasr-web) |
240
+ | MedASR MLX (Apple Silicon) | Discussion: [https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/672879](https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/672879) |
241
+ | MedASR ONNX export script | In ainergiz/medasr-web repo at /scripts/export\_onnx.py |
242
+
243
+ ### **MedGemma 27B GGUF Quantizations (Unsloth)**
244
+
245
+ * Available at: [https://huggingface.co/unsloth/medgemma-27b-it-GGUF/tree/main](https://huggingface.co/unsloth/medgemma-27b-it-GGUF/tree/main)
246
+ * Can run locally via Ollama: ollama run hf.co/unsloth/medgemma-27b-it-GGUF:Q8\_0 (31.8 GB)
247
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/673091](https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/673091)
248
+
249
+ ### **HuggingFace Collections**
250
+
251
+ | Collection | URL |
252
+ | ----- | ----- |
253
+ | MedGemma release | [https://huggingface.co/collections/google/medgemma-release-680aade845f90bec6a3f60c4](https://huggingface.co/collections/google/medgemma-release-680aade845f90bec6a3f60c4) |
254
+ | HAI-DEF collection | [https://huggingface.co/collections/google/health-ai-developer-foundations-hai-def](https://huggingface.co/collections/google/health-ai-developer-foundations-hai-def) |
255
+ | Community fine-tunes | [https://huggingface.co/models?other=or:base\_model:finetune:google/medgemma-4b-it,base\_model:finetune:google/medgemma-4b-pt,base\_model:finetune:google/medgemma-27b-it,base\_model:finetune:google/medgemma-27b-text-it](https://huggingface.co/models?other=or:base_model:finetune:google/medgemma-4b-it,base_model:finetune:google/medgemma-4b-pt,base_model:finetune:google/medgemma-27b-it,base_model:finetune:google/medgemma-27b-text-it) |
256
+
257
+ ### **Key Blog Posts**
258
+
259
+ | Title | URL |
260
+ | ----- | ----- |
261
+ | MedGemma 1.5 \+ MedASR announcement | [https://research.google/blog/next-generation-medical-image-interpretation-with-medgemma-15-and-medical-speech-to-text-with-medasr/](https://research.google/blog/next-generation-medical-image-interpretation-with-medgemma-15-and-medical-speech-to-text-with-medasr/) |
262
+ | Original MedGemma blog | [https://research.google/blog/medgemma-our-most-capable-open-models-for-health-ai-development/](https://research.google/blog/medgemma-our-most-capable-open-models-for-health-ai-development/) |
263
+ | HAI-DEF launch blog | [https://research.google/blog/helping-everyone-build-ai-for-healthcare-applications-with-open-foundation-models/](https://research.google/blog/helping-everyone-build-ai-for-healthcare-applications-with-open-foundation-models/) |
264
+ | AskCPG concept integration | [https://discuss.ai.google.dev/t/sharing-our-product-integration-with-medgemma-askcpg/94556](https://discuss.ai.google.dev/t/sharing-our-product-integration-with-medgemma-askcpg/94556) |
265
+
266
+ ### **Notable Competition Notebooks (for inspiration)**
267
+
268
+ | Notebook | Theme |
269
+ | ----- | ----- |
270
+ | MedFlow AI (24 upvotes) | Top-voted notebook |
271
+ | MedGemma Navigator: DICOMweb ↔ FHIR (16 upvotes) | FHIR/DICOM integration — similar to Clarke's EHR focus |
272
+ | MedGemma Medical AI Chatbot \+ 5 Test Scenarios (14 upvotes) | Medical chatbot with test scenarios |
273
+ | MedAssist Edge Offline Medical AI (7 upvotes) | Offline/edge deployment |
274
+ | Spasht AI — Bridging India's "Last Mile" Health Gap (6 upvotes) | Community health focus |
275
+ | RadAssist-MedGemma: AI Radiology Triage Assistant (4 upvotes) | Radiology triage |
276
+ | CRSA — Clinical Reasoning Stability Auditor | Reasoning evaluation |
277
+
278
+ **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/code](https://www.kaggle.com/competitions/med-gemma-impact-challenge/code)
279
+ ---
280
+
281
+ ## **6\. Gaps & Risks**
282
+
283
+ ### **MedGemma 4B Instruction Following Issues**
284
+
285
+ * Multiple competitors report MedGemma 4B (medgemma-1.5-4b-it) **leaking system prompts, generating meta-commentary, and outputting chain-of-thought training artifacts** (critique responses, constraint checklists, special tokens).
286
+ * One user reports running with Ollama locally works well; the issues may be prompt/framework dependent.
287
+ * **Risk for Clarke:** If using MedGemma 4B for EHR extraction, we need thorough prompt engineering and output parsing to handle instruction-following failures.
288
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/673091](https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/673091)
289
+
290
+ ### **MedGemma 27B Deployment is Extremely Challenging**
291
+
292
+ * Requires \~54GB VRAM — won't run on Apple M3 Max 64GB (MPS buffer limit), won't fit on 2x L4 GPUs (48GB).
293
+ * Vertex AI A100 quota requests being **rejected by Google**.
294
+ * Quantized GGUF versions (Unsloth Q8\_0 \= 31.8GB) can run via Ollama on local hardware.
295
+ * **Risk for Clarke:** The 24-hour build plan assumes MedGemma 27B for letter generation. If deploying on Kaggle/cloud is blocked by GPU quota, we need a fallback (quantized 27B via Ollama, or enhanced 4B with heavy prompt engineering).
296
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/673091](https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/673091)
297
+
298
+ ### **MedGemma 1.5 Only Available as 4B**
299
+
300
+ * MedGemma 1.5 is only released as 4B multimodal. The 27B model is still MedGemma 1 (text-only and multimodal).
301
+ * **Risk for Clarke:** Clarke spec references "MedGemma 27B" — should clarify we're using MedGemma 1 27B, not 1.5.
302
+ * **Source:** [https://developers.google.com/health-ai-developer-foundations/medgemma](https://developers.google.com/health-ai-developer-foundations/medgemma)
303
+
304
+ ### **No Provided Dataset \= You Must Source Your Own**
305
+
306
+ * Competition provides **zero data**. All data must be sourced externally.
307
+ * For Clarke: synthetic NHS consultation data or publicly available clinical note datasets needed. Non-commercial academic datasets are acceptable per organizer clarification.
308
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/rules](https://www.kaggle.com/competitions/med-gemma-impact-challenge/rules) (Section 2.4)
309
+
310
+ ### **Model Weights Release is Bonus, Not Required**
311
+
312
+ * Releasing fine-tuned model weights on HuggingFace is listed as a **bonus** submission element, not mandatory.
313
+ * However, it could significantly strengthen the "Effective use of HAI-DEF models" score.
314
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/671596](https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/671596) (Daniel Golden's response)
315
+
316
+ ### **EHR/FHIR Understanding is a New MedGemma 1.5 Capability**
317
+
318
+ * MedGemma 1.5 4B specifically adds **"EHR understanding for the interpretation of text-based EHR data"** — this directly supports Clarke's EHR navigation component.
319
+ * Consider using MedGemma 1.5 4B (instead of 1.0 4B) for the EHR extraction pipeline.
320
+ * **Source:** [https://research.google/blog/next-generation-medical-image-interpretation-with-medgemma-15-and-medical-speech-to-text-with-medasr/](https://research.google/blog/next-generation-medical-image-interpretation-with-medgemma-15-and-medical-speech-to-text-with-medasr/)
321
+
322
+ ### **Official Discord Exists But Not Monitored by Staff**
323
+
324
+ * Kaggle Discord at [http://discord.gg/kaggle](http://discord.gg/kaggle) — additional discussion channel, but organizers don't monitor it.
325
+ * **Source:** [https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/667660](https://www.kaggle.com/competitions/med-gemma-impact-challenge/discussion/667660)
326
+
327
+ ---
328
+
329
+ ## **Pages That Could Not Be Accessed**
330
+
331
+ | URL | Issue |
332
+ | ----- | ----- |
333
+ | Kaggle discussion threads via web\_fetch | JS-rendered, required browser automation |
334
+ | Kaggle Code notebooks (content) | Would require login/browser to read notebook code cells |
335
+ | AskCPG concept app details | [https://discuss.ai.google.dev/t/sharing-our-product-integration-with-medgemma-askcpg/94556](https://discuss.ai.google.dev/t/sharing-our-product-integration-with-medgemma-askcpg/94556) — not fetched |
336
+ | MedGemma model card | [https://developers.google.com/health-ai-developer-foundations/medgemma/model-card](https://developers.google.com/health-ai-developer-foundations/medgemma/model-card) — not fetched |
337
+ | MedASR developer docs | [https://developers.google.com/health-ai-developer-foundations/medasr/](https://developers.google.com/health-ai-developer-foundations/medasr/) — not fetched |
338
+ | HAI-DEF Terms of Use (full) | [https://developers.google.com/health-ai-developer-foundations/terms](https://developers.google.com/health-ai-developer-foundations/terms) — truncated at Section 3 |
339
+
Clarke_Product_Specification_V2.md ADDED
@@ -0,0 +1,610 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Clarke — Product Specification Document
2
+
3
+ **Version:** 2.0
4
+ **Date:** 12 February 2026
5
+ **Purpose:** Comprehensive specification for PRD generation and AI-agent execution within 24 working hours
6
+ **Competition:** MedGemma Impact Challenge (Kaggle)
7
+ **Targets:** Main Track 1st Place ($30,000) + Agentic Workflow Prize ($5,000)
8
+
9
+ ---
10
+
11
+ ## SECTION 1: PRODUCT VISION
12
+
13
+ ### 1.1 Clarke in One Sentence
14
+
15
+ Clarke is an open-source, privacy-preserving AI system that listens to a clinician's patient consultation, automatically retrieves the relevant patient record context, and generates a complete, structured clinical document — ready to review and sign off — using a pipeline of three purpose-built HAI-DEF medical models (MedASR → MedGemma 4B → MedGemma 27B) running entirely within the hospital network.
16
+
17
+ ### 1.2 The 30-Second Pitch
18
+
19
+ You talk to your patient. Clarke listens. In the background, it simultaneously pulls the patient's history, medications, recent bloods, and imaging results from the electronic health record. By the time you say goodbye, a draft clinic letter — structured to NHS standards, populated with the right investigation results, and chronologically ordered — is waiting on your screen. You review it, make any edits, and sign off. No dictaphone. No secretary. No 15-minute write-up from memory. No logging into five separate systems. Your letter is done before your next patient walks in. And no patient data ever leaves the building.
20
+
21
+ ### 1.3 The Core Synergy: Why Combining Ambient Documentation with EHR Navigation Creates a Product Greater Than the Sum of Its Parts
22
+
23
+ The two capabilities are not merely additive — they solve each other's critical weakness.
24
+
25
+ **Ambient documentation alone** (CliniScribe) captures what was *said* in the consultation but is blind to what already *exists* in the patient record. A clinician discusses a rising HbA1c, but the scribe has no access to the actual lab values, the trend over the past 6 months, or the current medication list. The clinician must still manually look up and insert this information, or the generated letter is incomplete. This is exactly why commercial ambient scribes like Heidi Health and DAX Copilot still require significant post-generation editing — they produce notes from conversation alone, without record context.
26
+
27
+ **EHR navigation alone** (WardBrief) retrieves and summarises existing patient data but cannot capture what happens *during* the consultation — the new symptoms reported, the examination findings, the shared decision-making, the agreed plan. It produces a pre-consultation briefing, not a post-consultation document.
28
+
29
+ **Clarke unifies both pipelines through a single integration point:** the conversation transcript (produced by MedASR) is passed to MedGemma 27B *alongside* the structured patient context (retrieved by MedGemma 4B acting as a FHIR agent). This means the document-generation model has access to both what the clinician discussed *and* what the record contains. The result is a clinical letter that:
30
+
31
+ - References actual lab values discussed ("Your HbA1c has risen from 48 to 55 mmol/mol since June 2025") rather than vague mentions ("blood sugar levels were discussed")
32
+ - Includes the current medication list without the clinician having to dictate it
33
+ - Automatically attaches relevant investigation results referenced during the conversation
34
+ - Cross-checks for consistency — if the clinician mentions "no allergies" but the record shows a penicillin allergy, Clarke flags the discrepancy
35
+
36
+ This integration transforms Clarke from a transcription tool into a **clinical reasoning assistant** that produces documents no standalone ambient scribe or EHR navigator could match.
37
+
38
+ **For the Agentic Workflow Prize specifically:** this pipeline — audio capture → medical speech recognition → agentic EHR data retrieval → context-enriched document generation → clinician review — constitutes a five-stage agentic workflow where three HAI-DEF models operate as intelligent agents, each making autonomous decisions about what to process and retrieve. This is precisely the "significant overhaul of a challenging process" the prize description calls for.
39
+
40
+ ---
41
+
42
+ ## SECTION 2: THE PROBLEM
43
+
44
+ ### 2.1 Problem Definition
45
+
46
+ NHS doctors spend four hours on administrative tasks for every one hour with patients, with clinical documentation — writing letters, discharge summaries, referral notes, and ward round entries — consuming the largest share of that administrative time. This documentation requires clinicians to assimilate information scattered across multiple disconnected IT systems, compose structured clinical narratives, and ensure accuracy, all under severe time pressure. The result is a workforce crisis: doctors burning out, leaving the profession, and — critically — spending less time with the 7.3 million patients currently waiting for NHS treatment.
47
+
48
+ ### 2.2 Quantified Magnitude
49
+
50
+ **(a) Affected clinicians.** The NHS employs 151,980 FTE hospital and community health service doctors and 38,220 FTE GPs in England alone — approximately 190,200 FTE doctors total [NHS Digital Workforce Statistics, August 2025; General Practice Workforce, December 2025]. Adding Scotland, Wales, and Northern Ireland brings the UK total to approximately 210,000 practising doctors. All of them document clinical encounters.
51
+
52
+ **(b) Time lost per clinician per day.** The TACT study (Time Allocation in Clinical Training), a national multicentre observational study of 137 NHS resident doctors observed across secondary care centres over 7 months (January–July 2024), found that doctors spent only 17.9% of their time on patient-facing activities while 73.0% was consumed by non-patient-facing tasks [Arab et al., QJM: An International Journal of Medicine, June 2025, doi:10.1093/qjmed/hcaf141]. Our own clinician survey (n=47, January–February 2026) confirmed that documentation specifically accounts for approximately 15 minutes per patient encounter, with clinicians seeing 10–12 patients per half-day clinic — yielding 2.5–3 hours of documentation time per clinic session.
53
+
54
+ **(c) National aggregate impact.**
55
+
56
+ - **Time:** If 190,200 FTE NHS doctors in England each save 30 minutes per day (the figure demonstrated by Olson et al., JAMA Network Open 2025), that yields **95,100 clinician-hours freed per day** — equivalent to approximately **11,888 additional full-time doctors** (at 8 hours/day). The NHS currently has 7,248 medical vacancies in secondary care alone [NHS Digital, September 2025]. Clarke's time savings could effectively fill more vacancies than currently exist.
57
+ - **Money:** The mean annual basic pay per FTE doctor is £90,290 [NHS Digital, August 2025]. The 30-minute daily saving per doctor equates to £5,642 per doctor per year in reclaimed clinical time, or **£1.07 billion annually** across 190,200 NHS doctors in England.
58
+ - **Career:** The GMC's 2025 National Training Survey found 61% of trainees at moderate or high risk of burnout [GMC NTS 2025; NHS Employers]. The 2024 NHS Staff Survey reported 42.19% of medical and dental staff experience work-related stress and 30.24% feel burnt out [BMA, Medical Staffing Data Analysis]. The JAMA Network Open study demonstrated that ambient AI scribes reduced burnout from 51.9% to 38.8% (OR 0.26, 95% CI 0.13–0.54, p<0.001) after just 30 days [Olson et al., JAMA Netw Open 2025;8(10):e2534976].
59
+ - **Waiting list:** 7.31 million cases on the NHS waiting list as of November 2025, with only 62% of patients treated within 18 weeks against a 92% constitutional standard [BMA Backlog Data Analysis; King's Fund]. Every hour freed from documentation is an hour available for patient care.
60
+
61
+ **(d) Patient safety consequences.** The TACT study found that junior trainees (FY1–ST5) spent only 17.8% of time on patient-facing tasks versus 38.4% for senior trainees (ST6–8) — meaning the least experienced doctors have the least patient contact and the most administrative burden [Arab et al., 2025]. The GMC 2025 NTS found 21% of trainees hesitated to escalate patient care concerns [GMC NTS 2025]. Fragmented records directly contribute to safety incidents: our survey respondents described emergency situations where patients were "unconscious, confused, or unable to speak" and "missing one piece of information like an allergy or heart condition could cause serious harm."
62
+
63
+ ### 2.3 Unmet Need — Why Existing Solutions Fall Short
64
+
65
+ | Competitor | What It Does | Why It Falls Short for NHS Clinicians |
66
+ |---|---|---|
67
+ | **Microsoft DAX Copilot (Nuance)** | Cloud-based ambient scribe integrated with Epic/Oracle | Sends patient audio to US-hosted cloud servers — violates NHS data sovereignty requirements. Requires enterprise EHR integration (Epic). Costs >$200/clinician/month at scale. Generates American-style documentation, not NHS clinical letters. |
68
+ | **Heidi Health** | Cloud-based ambient scribe with NHS GP traction (claims 60% of NHS GPs) | Closed-source, proprietary — NHS trusts cannot audit the model. Cloud-dependent ($99/month per clinician). No EHR record integration — generates notes from conversation only, without access to patient history, lab results, or imaging. Produces documentation that still requires manual enrichment with record data. |
69
+ | **Abridge** | Ambient AI scribe deployed at Yale, UCSF, and other US health systems | US-focused, no NHS deployment. Cloud-only architecture. No FHIR-based record navigation. |
70
+ | **Nabla** | European ambient scribe | Cloud-dependent. Limited NHS integration. No record context retrieval. |
71
+ | **Tortus AI / Accurx Scribe** | UK-based ambient scribes with NHS deployments | Accurx claims 97% clinical accuracy but is a closed proprietary solution with no EHR data retrieval. Tortus operates across 3,500+ GP practices but is cloud-dependent and subscription-based. Neither provides intelligent record navigation. |
72
+ | **Existing EHR systems (Cerner, EMIS, SystmOne)** | Store and display patient records | Present data in silos across separate modules. No synthesis, no summarisation, no document generation from conversation. Clinicians must manually navigate and compile information. |
73
+
74
+ **The specific unmet need Clarke addresses:** An open-source, fully local, NHS-tailored AI system that combines ambient clinical documentation with intelligent EHR record retrieval — producing context-enriched clinical documents from conversation, without patient data leaving the hospital, at zero per-clinician licensing cost.
75
+
76
+ No existing product does both. Every commercial ambient scribe generates documents from conversation alone, without EHR context. Every EHR system presents records without document generation. Clarke is the first system to close this loop using purpose-built medical AI models that can be audited, customised, and deployed entirely on-premise.
77
+
78
+ ---
79
+
80
+ ## SECTION 3: HOW CLARKE WORKS — END-TO-END
81
+
82
+ ### 3.1 Complete User Journey
83
+
84
+ **Pre-consultation (30 seconds):**
85
+
86
+ 1. The clinician opens Clarke in their web browser (accessible on any device connected to the hospital network). They see a clean dashboard showing their clinic list for the day — patient names, appointment times, and brief one-line summaries.
87
+ 2. They click on the next patient's name. Clarke's EHR Agent (MedGemma 4B) begins retrieving patient context in the background via FHIR API calls: demographics, problem list, current medications, recent blood results, recent imaging reports, and the last clinic letter.
88
+ 3. A "Patient Context" panel populates on the left side of the screen, showing a structured summary: key diagnoses, current medications, recent investigations with trends, and any flags (e.g., allergies, pending results). This takes 5–10 seconds.
89
+
90
+ **During consultation (10–30 minutes):**
91
+
92
+ 4. The clinician clicks "Start Consultation." Clarke activates the microphone. A small recording indicator appears at the top of the screen.
93
+ 5. The clinician has their normal conversation with the patient. They can glance at the Patient Context panel at any time for reference.
94
+ 6. MedASR processes the audio stream in real-time, producing a running transcript visible in a "Live Transcript" tab (minimised by default, expandable if the clinician wants to check).
95
+ 7. If the clinician mentions specific results ("Your kidney function has gotten a bit worse"), Clarke's EHR Agent can dynamically fetch the relevant lab values and queue them for inclusion in the document.
96
+
97
+ **Post-consultation (15–60 seconds):**
98
+
99
+ 8. The clinician clicks "End Consultation." The recording stops.
100
+ 9. Clarke sends the complete transcript (from MedASR) and the patient context (from the EHR Agent) to MedGemma 27B, which generates a structured clinical document.
101
+ 10. Within 10–30 seconds, a draft document appears in the main panel. It is structured as an NHS clinic letter with: date, patient demographics, addressee (GP), reason for consultation, history of presenting complaint, relevant past medical history, examination findings, investigation results (with actual values from the EHR), assessment, and plan.
102
+ 11. The clinician reviews the document. They can:
103
+ - Edit any text directly (inline editing)
104
+ - Click on any cited investigation result to see the source record
105
+ - Accept or reject individual sections
106
+ - Use a "Regenerate" button on any section to get an alternative phrasing
107
+ 12. Once satisfied, they click "Sign Off." The document is marked as approved and ready for export.
108
+ 13. The document can be exported as: a PDF, a FHIR DocumentReference resource for EHR integration, or copied to clipboard for pasting into the existing EHR system.
109
+ 14. Clarke resets. The clinician clicks the next patient. The cycle begins again.
110
+
111
+ **Ward Round Mode (alternative flow):**
112
+
113
+ For inpatient settings, Clarke offers a "Ward Round" mode. Instead of recording a single extended consultation, it records the ward round conversation for each patient sequentially. The clinician taps "Next Patient" as they move between beds. For each patient, Clarke generates a ward round entry (not a letter) containing: overnight events, current status, examination findings, investigation results, and today's plan. This mode produces a daily progress note rather than a clinic letter, but uses the same underlying pipeline.
114
+
115
+ ### 3.2 System Architecture — Data Flow
116
+
117
+ ```
118
+ ┌─────────────────────────────────────────────────────────────────┐
119
+ │ CLARKE SYSTEM │
120
+ │ │
121
+ │ ┌──────────┐ ┌───────────┐ ┌──────────────────────────┐ │
122
+ │ │ Browser │◄──►│ Gradio 5.x│◄──►│ Orchestrator │ │
123
+ │ │ (User) │ │ Frontend │ │ (Python / FastAPI) │ │
124
+ │ └──────────┘ └───────────┘ └────────┬─────────┬────────┘ │
125
+ │ │ │ │
126
+ │ ┌───────────────────┐ │ │ │
127
+ │ │ │ │ │ │
128
+ │ ┌─────▼─────┐ ┌──────▼────▼──┐ │ │
129
+ │ │ MedASR │ │ MedGemma │ │ │
130
+ │ │ (105M) │ │ 1.5 4B │ │ │
131
+ │ │ │ │ EHR Agent │ │ │
132
+ │ │ Audio→Text │ │ FHIR→Context │ │ │
133
+ │ └─────┬──────┘ └──────┬────────┘ │ │
134
+ │ │ │ │ │
135
+ │ │ TRANSCRIPT │ PATIENT │ │
136
+ │ │ │ CONTEXT │ │
137
+ │ │ │ │ │
138
+ │ └────────┬──────────┘ │ │
139
+ │ │ │ │
140
+ │ ┌────────▼────────┐ │ │
141
+ │ │ MedGemma 27B │ │ │
142
+ │ │ (text-only) │ │ │
143
+ │ │ │ │ │
144
+ │ │ Transcript + │ │ │
145
+ │ │ Context → │ │ │
146
+ │ │ Clinical Letter │ │ │
147
+ │ └────────┬────────┘ │ │
148
+ │ │ │ │
149
+ │ ┌────────▼────────┐ │ │
150
+ │ │ Draft Document │ │ │
151
+ │ │ (Structured) │◄──────────────┘ │
152
+ │ └─────────────────┘ │
153
+ │ │
154
+ │ ┌──────────────┐ │
155
+ │ │ FHIR Server │ (Synthea synthetic data for demo; │
156
+ │ │ (HAPI FHIR) │ real hospital FHIR server in production) │
157
+ │ └──────────────┘ │
158
+ │ │
159
+ │ ALL PROCESSING WITHIN HOSPITAL NETWORK │
160
+ └─────────────────────────────────────────────────────────────────┘
161
+ ```
162
+
163
+ **Detailed Data Flow:**
164
+
165
+ **Stage 1 — Audio Capture → Medical Transcript**
166
+ - **Model:** MedASR (`google/medasr` on Hugging Face). 105M parameters. Conformer-based ASR.
167
+ - **Input:** Mono-channel audio stream, 16kHz, int16 waveform. Captured via browser MediaRecorder API, chunked and sent to backend via WebSocket.
168
+ - **Processing:** Audio chunks processed using `transformers` pipeline with `chunk_length_s=20` and `stride_length_s=2` for streaming transcription.
169
+ - **Output:** Full-text transcript of the clinician-patient conversation.
170
+
171
+ **Stage 2 — Agentic EHR Context Retrieval**
172
+ - **Model:** MedGemma 1.5 4B multimodal, instruction-tuned (`google/medgemma-1.5-4b-it` on Hugging Face). 4B parameters.
173
+ - **Architecture:** Operates as a LangGraph-based agent following Google's EHR Navigator pattern. The agent receives a clinical query (e.g., "Retrieve patient context for consultation"), discovers available FHIR resources, plans which to retrieve, fetches them via FHIR REST API calls, extracts relevant facts, and synthesises a structured context summary.
174
+ - **Input:** Patient identifier + clinical query (initially "Prepare pre-consultation summary"; updated dynamically during consultation if specific results are mentioned in the transcript).
175
+ - **FHIR Resources Queried:** Patient, Condition, MedicationRequest, Observation (labs), DiagnosticReport, AllergyIntolerance, Encounter (recent), DocumentReference (last clinic letter).
176
+ - **Output:** Structured JSON containing: demographics, problem list, current medications, allergies, recent lab results (with values, units, reference ranges, and dates), recent imaging summaries, and key excerpts from the most recent clinic letter.
177
+
178
+ **Stage 3 — Context-Enriched Document Generation**
179
+ - **Model:** MedGemma 27B text-only, instruction-tuned (`google/medgemma-27b-text-it` on Hugging Face). 27B parameters.
180
+ - **Input:** A structured prompt containing:
181
+ 1. The full conversation transcript (from Stage 1)
182
+ 2. The structured patient context JSON (from Stage 2)
183
+ 3. A document template specifying the target format (NHS clinic letter, discharge summary, ward round note, or referral letter — selected by the clinician)
184
+ 4. Instructions for clinical document conventions (chronological ordering, inclusion of positive and negative findings, reference to specific investigation values)
185
+ - **Output:** A structured clinical document in the specified NHS format, with inline references to the source data (e.g., lab values traced to specific FHIR Observation resources).
186
+
187
+ **Integration Point:** The orchestrator (Python/FastAPI) coordinates all three stages. The transcript and patient context are combined into a single prompt for MedGemma 27B. This is the critical fusion that no existing product achieves — the document generator sees both the conversation and the record simultaneously.
188
+
189
+ ### 3.3 HAI-DEF Model Justification
190
+
191
+ **MedASR (105M, `google/medasr`):**
192
+
193
+ (a) *Why not a general-purpose ASR like Whisper?* MedASR achieves 5.2% WER on chest X-ray dictation compared to 12.5% for Whisper large-v3 — 58% fewer transcription errors. On broad medical dictation, MedASR reaches 5.2% WER versus 28.2% for Whisper — 82% fewer errors [Google Research Blog, January 2026]. Medical terminology errors in transcription directly corrupt downstream document generation.
194
+
195
+ (b) *Exploited capability:* MedASR was specifically trained on ~5,000 hours of de-identified physician dictations across radiology, internal medicine, and family medicine [Google/MedASR Hugging Face Model Card]. It has a strong vocabulary for drug names, anatomical terms, and clinical phrases that general ASR models consistently misrecognise.
196
+
197
+ (c) *Beyond default use case:* MedASR was designed for single-speaker medical dictation. Clarke uses it for two-party ambient conversation (clinician + patient), which is significantly harder due to speaker overlap, non-medical language, and background noise. We fine-tune MedASR with LoRA on synthetic multi-speaker clinical conversations to improve ambient performance. [**Assumption:** ambient performance will degrade relative to single-speaker dictation; fine-tuning mitigates but may not fully close the gap.]
198
+
199
+ **MedGemma 1.5 4B Multimodal (`google/medgemma-1.5-4b-it`):**
200
+
201
+ (a) *Why not a general-purpose LLM for EHR navigation?* MedGemma 1.5 4B was specifically trained on FHIR-based EHR data and medical documents. Google's own EHR Navigator Agent notebook demonstrates this capability. A general-purpose model would need extensive prompting to understand FHIR resource structures; MedGemma 4B understands them natively. Furthermore, its 4B parameter count enables it to run on modest GPU hardware, making the agentic loop fast enough for real-time context retrieval (critical for the pre-consultation loading step).
202
+
203
+ (b) *Exploited capability:* FHIR-native EHR understanding, medical document comprehension, and structured data extraction from lab reports — all capabilities added in the 1.5 release [Google Research Blog, January 2026].
204
+
205
+ (c) *Beyond default use case:* Google's EHR Navigator demonstrates single-query Q&A over FHIR records. Clarke extends this to multi-step agentic retrieval with dynamic re-querying based on the live conversation transcript — if a clinician mentions a result that wasn't in the initial context fetch, the agent retrieves it on-the-fly.
206
+
207
+ **MedGemma 27B Text-Only (`google/medgemma-27b-text-it`):**
208
+
209
+ (a) *Why not a general-purpose LLM for document generation?* MedGemma 27B was trained on medical text, medical Q&A pairs, and EHR data. It is optimised for inference-time computation on medical reasoning tasks [MedGemma Technical Report, arXiv:2507.05201]. A general-purpose model of equivalent size would produce plausible-sounding but medically imprecise documents. MedGemma 27B's medical training means it understands which findings are clinically significant, how to order information in a clinical narrative, and when to include negative findings — all essential for a high-quality clinical letter.
210
+
211
+ (b) *Exploited capability:* Medical text generation with clinical reasoning. The model can infer that if a patient is on metformin and their HbA1c has risen, this is clinically significant and should be prominently documented.
212
+
213
+ (c) *Beyond default use case:* MedGemma 27B is designed for medical question-answering and reasoning. Clarke uses it for structured document generation to NHS clinical letter standards, which requires not just medical knowledge but understanding of UK healthcare documentation conventions. We fine-tune it on exemplar NHS clinical letters using LoRA to instil these conventions.
214
+
215
+ ---
216
+
217
+ ## SECTION 4: WHAT MAKES CLARKE SUPERIOR
218
+
219
+ ### 4.1 Why HAI-DEF Models Are Superior to General-Purpose Alternatives
220
+
221
+ **(a) Versus GPT-4 / Gemini:** Three concrete advantages:
222
+
223
+ 1. **Privacy.** GPT-4 and Gemini require cloud API calls, sending patient audio and clinical data to third-party servers. Our clinician survey recorded a respondent stating: "Complete and unbreakable privacy would be non-negotiable. Any data sent to US companies (even if their servers are in the UK) would not be private." HAI-DEF models are open-weight and run locally. Zero patient data leaves the hospital.
224
+ 2. **Medical accuracy.** MedASR outperforms Whisper (a general-purpose ASR comparable to what GPT-4's speech mode uses) by 58–82% on medical dictation WER [Google Research Blog, January 2026]. MedGemma 4B's FHIR understanding is native, not prompt-engineered.
225
+ 3. **Cost at scale.** A one-time GPU infrastructure investment versus perpetual per-clinician API fees. At 190,200 NHS doctors, even $0.10 per consultation at 20 consultations/day = $380,400/day in API costs. Local HAI-DEF deployment: $0 marginal cost per consultation.
226
+
227
+ **(b) Versus using each HAI-DEF model in isolation:** The pipeline creates emergent value. MedASR alone produces a transcript. MedGemma 4B alone produces a patient summary. MedGemma 27B alone could generate text from a prompt. But without the pipeline, the transcript has no record context, the summary has no consultation content, and the document generator has neither. The pipeline produces a document that is qualitatively different from what any individual model could generate — it is simultaneously grounded in the conversation and in the patient's medical record.
228
+
229
+ ### 4.2 Why Clarke Could Be Superior to Heidi Health and Other Commercial Ambient Scribes
230
+
231
+ Heidi Health is the strongest current competitor in the NHS (claims 60% GP adoption, $65M Series B, 340,000 consultations/week) [TechFundingNews, October 2025]. Clarke's advantages are structural, not merely ideological:
232
+
233
+ 1. **Record context integration.** Heidi generates notes from conversation only. Clarke generates documents enriched with actual EHR data — lab values, medication lists, allergy checks. This is not a feature Heidi can easily add because their architecture sends audio to their cloud for processing and has no connection to the hospital's FHIR server. Clarke's local deployment enables direct FHIR server access.
234
+
235
+ 2. **Clinical safety through cross-referencing.** Because Clarke sees both the conversation and the record, it can flag discrepancies (e.g., clinician says "no known allergies" but record shows penicillin allergy). No conversation-only scribe can do this.
236
+
237
+ 3. **Zero marginal cost.** Heidi charges $99/month per clinician. For 190,200 NHS doctors, that is $226 million/year. Clarke is open-source with zero licensing cost. The only cost is GPU hardware, which NHS trusts already possess or can acquire as a one-time capital expenditure.
238
+
239
+ 4. **Auditability.** NHS Information Governance requires that AI systems processing patient data be auditable. Heidi is a closed-source black box. Clarke's code and model weights are fully inspectable.
240
+
241
+ 5. **NHS document format compliance.** Clarke is fine-tuned specifically on NHS clinical letter templates. Heidi generates generic notes that clinicians must reformat.
242
+
243
+ ### 4.3 Special Technology Prize: Agentic Workflow Prize
244
+
245
+ Clarke targets the **Agentic Workflow Prize** ("the project that most effectively reimagines a complex workflow by deploying HAI-DEF models as intelligent agents or callable tools").
246
+
247
+ The combined product gives a stronger case than either idea alone because it demonstrates a **multi-model agentic pipeline** where:
248
+
249
+ - MedASR operates as an autonomous audio processing agent
250
+ - MedGemma 4B operates as an autonomous EHR navigation agent (planning FHIR queries, deciding which resources to retrieve, extracting relevant facts)
251
+ - MedGemma 27B operates as an autonomous document generation agent (deciding structure, selecting relevant information, generating prose)
252
+ - The orchestrator coordinates these three agents into a cohesive workflow
253
+
254
+ A documentation-only tool (CliniScribe) would demonstrate a two-stage pipeline. A navigation-only tool (WardBrief) would demonstrate a single-agent system. Clarke demonstrates a **three-model, five-stage agentic workflow** that fundamentally reimagines the entire clinical documentation process from audio capture to signed-off letter.
255
+
256
+ ---
257
+
258
+
259
+ ## SECTION 5: IMPACT QUANTIFICATION
260
+
261
+ ### 5.1 Impact Estimation Table
262
+
263
+ | # | Metric | Estimate | Calculation Method | Source |
264
+ |---|---|---|---|---|
265
+ | 1 | **Documentation time saved per clinician per day** | 30 minutes | Direct measurement: 263 physicians across 6 US health systems, pre-post ambient AI scribe deployment over 90 days. Measured via EHR time-stamp logs. | Olson et al., JAMA Netw Open 2025;8(10):e2534976 |
266
+ | 2 | **Documentation time saved per patient encounter** | ~15 minutes | Clinician self-report from 47 NHS clinicians. Direct quote: "save me 15 mins for every patient." Consistent with 30 min/day ÷ 2 encounters/hour. | MedGamma Clinician Survey, Jan–Feb 2026 (n=47) |
267
+ | 3 | **After-hours documentation ("pajama time") reduction** | 42% decrease in after-hours EHR work; 66% decline in documentation delays | Retrospective study of 181 primary care physicians and APPs across 14 practices using hybrid ambient AI + virtual scribe. | Moura et al., J Gen Intern Med 2025. DOI:10.1007/s11606-025-09979-5 |
268
+ | 4 | **Total EHR time reduction per note** | 15% less total EHR time; >15% less note-composing time specifically | Matched cohort comparison of 704 primary care clinicians (239 ambient AI users vs. 465 controls) at UChicago Medicine over 90 days. | Pearlman et al., JAMA Netw Open 2025;8(10):e2537000 |
269
+ | 5 | **NHS doctors who benefit (England)** | 190,200 FTE | 151,980 FTE HCHS doctors + 38,220 FTE GPs. | NHS Digital Workforce Statistics Aug 2025 + GP Workforce Dec 2025 |
270
+ | 6 | **Clinician-hours freed daily (England)** | 95,100 hours/day | 190,200 doctors × 0.5 hours saved/day. | Calculated from rows 1 + 5 |
271
+ | 7 | **Equivalent full-time doctors freed** | ~11,888 FTE | 95,100 hours ÷ 8 hours/day. For context: NHS had 7,248 medical vacancies in secondary care alone (Sep 2025). | Calculated; vacancy figure from NHS Digital |
272
+ | 8 | **Annual financial value of reclaimed time** | £1.07 billion | 190,200 × (30 min/day × 250 working days/year) × (£90,290 mean annual salary ÷ 2,000 working hours/year) = 190,200 × £5,642 = £1.07B. | NHS Digital mean doctor salary Aug 2025 + calculation |
273
+ | 9 | **Burnout prevalence reduction (documentation-specific)** | 21.2 percentage-point absolute reduction at 84 days | Pre-post survey of 873 physicians and APPs piloting ambient AI at Mass General Brigham. Survey response rates: 30% at 42 days, 22% at 84 days. | You et al., JAMA Netw Open 2025. DOI:10.1001/jamanetworkopen.2025.28056 |
274
+ | 10 | **Overall burnout reduction (comprehensive measure)** | 13.1 percentage points (51.9% → 38.8%; OR 0.26, 95% CI 0.13–0.54, p<0.001) at 30 days | Pre-post measurement across 263 physicians at 6 health systems. | Olson et al., JAMA Netw Open 2025;8(10):e2534976 |
275
+ | 11 | **Documentation well-being improvement** | 30.7 percentage-point absolute increase at 60 days | Pre-post survey of 557 clinicians (attending physicians, APPs, residents, fellows) at Emory Healthcare. 11% response rate. | You et al., JAMA Netw Open 2025 (Emory arm) |
276
+ | 12 | **Ward round preparation time saved (EHR navigation)** | 25–50 minutes per ward round | Currently 30–60 min to prepare a 10–12 patient ward round (multi-system login, data compilation). Reduced to 5–10 min with automated FHIR context retrieval. | Clinician survey descriptions + clinical workflow estimate |
277
+ | 13 | **Patient safety: missing clinical information** | 15% of outpatient consultations have missing key information; of those, 32% experience care delays, 20% have documented risk of harm | Prospective study across 3 UK teaching hospitals. Extrapolation: ~10M outpatients/year seen without key info nationally, ~2M at risk of harm. | Burnett et al., BMC Health Serv Res 2011;11:114 |
278
+ | 14 | **Patient safety: clinical negligence cost** | £4.6 billion annual cost of harm (2024/25); £60.3 billion total provision for future liabilities | NHS Resolution annual report. 14,428 new clinical negligence claims received in 2024/25. Documentation failures are a contributory factor in many claims. | NHS Resolution Annual Report 2024/25 |
279
+ | 15 | **Commercial licensing cost avoided** | £226 million/year | 190,200 doctors × $99/month (Heidi Health pricing) × 12 months × 0.79 GBP/USD. Clarke's open-source model: £0 licensing. Hardware cost: £5K–£15K per trust (one-time). | Heidi Health pricing + calculation |
280
+
281
+ ### 5.2 Honest Caveats
282
+
283
+ **1. The 30-minute daily saving comes from US ambulatory care, not NHS settings.** The Olson et al. and Pearlman et al. studies measured US clinicians generating SOAP notes in 15–20 minute primary care visits. NHS workflows differ structurally: GP consultations are typically 10 minutes, hospital clinics vary by specialty, and the documentation output is an NHS-format clinical letter (not a SOAP note). The actual NHS saving could be higher (because NHS clinicians still use dictaphones and manual secretary workflows, which are slower than the US EHR-based systems that ambient AI was compared against) or lower (because shorter consultations produce less source material for the AI). **Validation required:** A prospective time-motion study within a UK NHS pilot comparing Clarke to current documentation methods.
284
+
285
+ **2. MedASR has not been validated on NHS-accented English at scale.** The NHS workforce includes doctors from over 100 countries. MedASR was trained on ~5,000 hours of US-accented medical dictation. While its Conformer architecture handles accent variation better than RNN-based models, and its fine-tuning notebook enables LoRA adaptation to new accents, we have not empirically tested WER on South Asian, Nigerian, Middle Eastern, or Scottish-accented English — the dominant NHS demographic groups. **Assumption:** LoRA fine-tuning on 100–500 hours of NHS-accented audio will bring WER to within 2 percentage points of the US baseline. This assumption has not been validated.
286
+
287
+ **3. FHIR API availability varies across NHS trusts.** NHS England's Interoperability Standards mandate FHIR exposure, but adoption is uneven. Some trusts (especially those on EMIS or SystmOne) have robust FHIR endpoints; others (especially legacy Cerner/Oracle Health sites) have limited FHIR R4 support. Clarke degrades gracefully — it operates in "documentation-only" mode without FHIR — but the full value proposition requires FHIR access. **Validation required:** Survey of FHIR readiness across target pilot trusts before deployment.
288
+
289
+ **4. Burnout reduction figures may not transfer to UK clinicians.** The You et al. study (Mass General Brigham/Emory) involved self-selected pilot users with modest response rates (22–30% at MGB; 11% at Emory), likely overrepresenting enthusiastic adopters. The Olson et al. study had stronger methodology but was still US-based. NHS burnout drivers include non-documentation factors (understaffing, pay disputes, waiting list pressure) that ambient AI cannot address. **Assumption:** Clarke will reduce documentation-related burnout but will not solve systemic NHS workforce issues.
290
+
291
+ **5. AI-generated clinical documents will contain errors.** MedGemma 27B may hallucinate clinical findings, misattribute lab values, or generate imprecise medical terminology. The FHIR agent may fail to retrieve relevant records or retrieve irrelevant ones. MedASR will produce transcription errors, especially in noisy clinical environments. Clinician review and sign-off is mandatory for every document — Clarke is an authoring aid, not an autonomous documentation system. Google explicitly states HAI-DEF model outputs require independent professional verification.
292
+
293
+ **6. The financial impact estimate (£1.07B) assumes the reclaimed time is used productively.** In reality, some freed time will be absorbed by other administrative tasks, meetings, or systemic inefficiencies. The per-doctor financial value is calculated at the average salary rate; the true value depends on whether freed time translates to additional patient encounters or clinical activity.
294
+
295
+ ---
296
+
297
+ ## SECTION 6: TECHNICAL FEASIBILITY & 3-DAY BUILD PLAN
298
+
299
+ ### 6.1 Hour-by-Hour Build Plan (24 Working Hours)
300
+
301
+ **DAY 1 — FRIDAY 13 FEBRUARY (Hours 1–8): Foundation + Core Model Pipelines**
302
+
303
+ | Hour | Task | Tools / Libraries | Deliverable | "Done" Criterion |
304
+ |---|---|---|---|---|
305
+ | 1 | **Environment setup.** Provision Hugging Face Space with A100 40GB GPU. Install Python 3.11, PyTorch 2.4.x, transformers, FastAPI, Gradio. Clone starter repo. | HF Spaces (Docker `nvidia/cuda:12.4.1-runtime-ubuntu22.04`), `pip`, `git` | Running HF Space with GPU confirmed via `torch.cuda.is_available()` | Space boots, GPU confirmed, all core packages import without error |
306
+ | 2 | **Synthetic FHIR data generation.** Generate 50 UK-style synthetic patients using Synthea. Load into HAPI FHIR server (Docker container within Space). Configure patient names, NHS numbers, UK drug formulary, mmol/L units. | Synthea v3.3.0 (Java CLI), HAPI FHIR v7.4 (Docker), `requests` for REST verification | 50 patient records accessible via `GET /fhir/Patient` | `curl http://localhost:8080/fhir/Patient?_count=50` returns 50 patients with UK-formatted data |
307
+ | 3 | **MedASR integration (Part 1).** Load `google/medasr` via `transformers` AutoModelForSpeechSeq2Seq pipeline. Build audio ingestion: browser MediaRecorder API → WebSocket → server-side WAV conversion (16kHz, mono, int16). | `transformers` 4.47.x, `librosa` 0.10.x, `pydub` 0.25.x, `ffmpeg` 7.x, `websockets` 12.x | Audio capture → WAV file on server | Record 30-second audio clip in browser → receive valid 16kHz WAV on server |
308
+ | 4 | **MedASR integration (Part 2).** Implement chunked transcription: `chunk_length_s=20`, `stride_length_s=2`, `return_timestamps=True`. Build real-time transcript WebSocket return to frontend. Test with sample medical dictation audio. | `transformers` pipeline, `torch`, custom WebSocket handler | Working audio → text pipeline with streaming output | Transcribe 5-minute medical dictation WAV with visible real-time text in browser console; visual inspection confirms <10% WER |
309
+ | 5 | **MedGemma 4B EHR Agent (Part 1).** Load `google/medgemma-1.5-4b-it` in 4-bit quantised mode. Implement LangGraph ReAct agent following Google's EHR Navigator pattern. Define FHIR tools: `search_patients`, `get_conditions`, `get_medications`, `get_observations`, `get_allergies`, `get_diagnostic_reports`. | `transformers`, `bitsandbytes` 0.44.x, `langgraph` 0.2.x, `httpx` for FHIR REST calls | Agent loads, tools callable, returns raw FHIR resources | Agent called with patient ID → returns raw Condition, MedicationRequest, Observation resources as JSON |
310
+ | 6 | **MedGemma 4B EHR Agent (Part 2).** Add context synthesis: agent extracts and structures facts from raw FHIR resources into a standardised JSON schema (demographics, problem_list, medications, allergies, recent_labs, recent_imaging, last_letter_excerpt). Add flags (rising trends, critical values, allergy alerts). | Custom JSON schema, `langgraph` state management | Structured patient context JSON for any synthetic patient | Agent returns well-formed JSON for 5 test patients; manual inspection confirms accuracy against FHIR data |
311
+ | 7 | **Orchestrator: connect MedASR + EHR Agent.** Build FastAPI orchestrator with endpoints: `POST /start-consultation` (activates audio capture + triggers EHR agent), `POST /end-consultation` (stops audio, sends transcript + context to document generation placeholder), `GET /patient-context/{id}` (returns pre-fetched context). | `FastAPI` 0.109.x, `uvicorn` 0.27.x, Python `asyncio` | Working orchestrator connecting both pipelines | Call `/start-consultation` with patient ID → EHR context loads while audio captures → call `/end-consultation` → receive combined prompt containing transcript + context |
312
+ | 8 | **Integration test + prompt engineering.** Design the document-generation prompt template: system message (NHS letter conventions), user message (transcript + context JSON + document type). Test with a mock transcript and real FHIR context, passing to MedGemma 27B placeholder (print output). Build 3 prompt variants: clinic letter, discharge summary, ward round note. | Custom Jinja2 templates, `jinja2` 3.1.x | 3 tested prompt templates producing correctly formatted combined prompts | Visual inspection of 3 generated prompts confirms correct structure: system instructions → transcript → FHIR context → format specification |
313
+
314
+ **DAY 2 — SATURDAY 14 FEBRUARY (Hours 9–16): Document Generation + UI + Fine-tuning**
315
+
316
+ | Hour | Task | Tools / Libraries | Deliverable | "Done" Criterion |
317
+ |---|---|---|---|---|
318
+ | 9 | **MedGemma 27B loading + baseline generation.** Load `google/medgemma-27b-text-it` in 4-bit quantisation via bitsandbytes (NF4, double quantisation, `compute_dtype=bfloat16`). Generate 5 baseline clinic letters from the 3 prompt templates using 5 synthetic patients. | `transformers`, `bitsandbytes` 0.44.x, `accelerate` 1.2.x | MedGemma 27B loaded, generates text | 5 generated letters saved as baseline for comparison. Model generates coherent medical text (even if format is not yet NHS-standard) |
319
+ | 10 | **Synthetic training data generation.** Generate 250 training triplets (transcript, FHIR context JSON, reference NHS clinic letter) using Claude API as a data generator. Each triplet based on a unique Synthea patient scenario. Apply NHS letter template conventions. Manually review 20 for quality. | `anthropic` SDK, Synthea data, custom generation scripts | 250 validated training triplets + 50 held-out test triplets in JSONL format | Training file (`train.jsonl`) and test file (`test.jsonl`) pass schema validation; 20 manually reviewed samples are clinically plausible and correctly formatted |
320
+ | 11 | **LoRA fine-tuning MedGemma 27B.** Fine-tune using QLoRA: base model 4-bit NF4, LoRA adapters on `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`. Hyperparameters: rank=16, alpha=32, dropout=0.05, learning_rate=2e-4, warmup_ratio=0.03, lr_scheduler=cosine, max_seq_length=4096, per_device_batch_size=2, gradient_accumulation_steps=8, epochs=3, optimizer=paged_adamw_8bit. | `peft` 0.13.x, `trl` 0.12.x (SFTTrainer), `bitsandbytes`, `datasets`, `wandb` for logging | Fine-tuned LoRA adapter saved to disk + training loss curve | Training completes without OOM. Final training loss < initial loss. Adapter file size < 500MB. |
321
+ | 12 | **Post-training evaluation.** Generate 50 test letters using fine-tuned model. Compute BLEU and ROUGE-L against held-out reference letters. Compare to 5 baseline letters from hour 9. Manual review of 10 generated letters for: (a) NHS format compliance, (b) clinical accuracy, (c) correct use of FHIR-sourced values, (d) appropriate positive and negative findings. | `evaluate` (HuggingFace), `rouge_score`, `sacrebleu`, manual review | Evaluation report: BLEU, ROUGE-L, qualitative assessment | Fine-tuned BLEU > baseline BLEU. Manual review confirms improved NHS format compliance. Report saved as `evaluation_report.md` |
322
+ | 13 | **Gradio UI (Part 1): core layout.** Build main interface: left panel (patient context), centre panel (document editor), top bar (recording controls + status). Implement: patient list dropdown, "Start Consultation" / "End Consultation" buttons, recording indicator, live transcript expandable section. | `gradio` 5.x, custom CSS (NHS Design System colour palette: `#003087` primary, `#005EB8` secondary, `#FFFFFF` background), `gradio.themes` | Functional UI layout with all panels and controls | UI renders in browser. All buttons are clickable. Panels are correctly positioned and responsive at 1280×720 and 1920×1080 |
323
+ | 14 | **Gradio UI (Part 2): data binding.** Connect UI to backend: clicking a patient triggers EHR agent → context panel populates. "Start Consultation" activates audio capture via JavaScript interop. "End Consultation" triggers document generation → draft letter appears in centre panel with inline editing. "Sign Off" button marks document as final. Export buttons (PDF, clipboard, FHIR DocumentReference). | `gradio` event handlers, `gr.State`, JavaScript interop for MediaRecorder | Fully interactive UI connected to backend | Complete end-to-end demo: select patient → see context → start recording → stop → draft letter appears → edit → sign off. All stages functional |
324
+ | 15 | **Integration testing (5 full scenarios).** Test 5 complete consultation scenarios end-to-end. Use pre-recorded synthetic audio clips (generated via TTS or self-recorded). Verify: correct transcription, accurate FHIR context retrieval, clinically appropriate generated letters, working inline editing, successful sign-off and export. | Pre-recorded WAV files, `pytest` for automated checks, manual verification | 5 passing end-to-end demo scenarios | All 5 scenarios complete without crash or error. Generated letters are clinically coherent for each patient |
325
+ | 16 | **Bug fixing + error handling.** Address all issues found in hour 15. Add: loading spinners, error messages for failed FHIR queries, timeout handling for model inference, graceful degradation if MedASR produces empty transcript, fallback if MedGemma 27B OOMs (reduce context length, retry). | Standard Python error handling, Gradio UI feedback components | Robust error-handled application | Re-run all 5 scenarios. Intentionally trigger 3 error conditions (empty audio, FHIR timeout, long transcript). All handled gracefully with user-visible feedback |
326
+
327
+ **DAY 3 — SUNDAY 15 FEBRUARY (Hours 17–24): Evaluation, Polish, Deployment**
328
+
329
+ | Hour | Task | Tools / Libraries | Deliverable | "Done" Criterion |
330
+ |---|---|---|---|---|
331
+ | 17 | **MedASR evaluation.** Measure WER on: (a) 20 synthetic medical dictation clips using `jiwer`, (b) 10 ambient conversation clips. Compare to Whisper large-v3 on same data. Document results. | `jiwer` 3.0.x, `openai-whisper` (for baseline comparison), custom test harness | WER comparison table: MedASR vs Whisper, dictation vs ambient | WER computed for all 30 clips. Results documented in `evaluation_report.md` |
332
+ | 18 | **EHR Agent evaluation.** Test FHIR retrieval accuracy on 20 synthetic patients against manually annotated "gold standard" context summaries. Measure: fact recall (% of relevant facts extracted), precision (% of extracted facts that are correct), hallucination rate (% of fabricated facts). | Custom evaluation script, 20 annotated gold standards | Agent accuracy report | Fact recall >85%, precision >90%, hallucination rate <10%. Results appended to `evaluation_report.md` |
333
+ | 19 | **Document generation evaluation + Ward Round mode.** Compute aggregate BLEU/ROUGE-L on full 50-sample test set. Implement Ward Round mode UI (sequential patient recording, progress note template instead of clinic letter). Test with 2 ward round scenarios. | `evaluate`, custom scripts, Gradio tab component | Ward Round mode functional + aggregate evaluation metrics | Ward Round tab generates progress notes. All metrics documented |
334
+ | 20 | **UI polish.** Apply NHS Design System aesthetics: colour palette, typography (Frutiger/Arial), spacing, accessibility (ARIA labels, contrast ratios ≥4.5:1). Add Clarke logo/branding. Responsive design check at 3 viewport sizes. Loading state animations. | CSS custom properties, Gradio theming API, SVG logo | Production-quality visual design | UI screenshots pass visual QA at 1280×720, 1920×1080, and 768×1024 (tablet). Lighthouse accessibility score ≥90 |
335
+ | 21 | **Pre-recorded demo preparation.** Record 3 polished demo audio clips (Mrs Thompson T2DM, Mr Okafor chest pain, Ms Patel asthma review). Each clip: 2–4 minutes, clear audio, realistic clinical dialogue. Save as 16kHz WAV. Verify MedASR transcribes accurately. | Audio recording (self/TTS), `ffmpeg` for format conversion | 3 demo-ready audio clips with verified transcripts | Each clip transcribes with <8% WER. Saved in `demo_data/` directory |
336
+ | 22 | **Hugging Face deployment.** Push application to public HF Space. Upload LoRA adapters to HF Hub as a separate model repository (tracing to `google/medgemma-27b-text-it`). Verify public access: anyone can visit the URL and interact. Test from a different browser/device. | HF Spaces CLI, `huggingface_hub` SDK, `git-lfs` | Live public demo URL + public LoRA adapter repo | External browser (incognito) can access demo, select a patient, play pre-recorded audio, and receive a generated letter |
337
+ | 23 | **Repository documentation.** Write comprehensive README.md: project description, architecture diagram, installation instructions, usage guide, model card, evaluation results, licence (Apache 2.0 for code, HAI-DEF terms for models). Annotate all source code files with docstrings and inline comments. | Markdown, standard Python docstrings | Documented public GitHub repository | README contains: architecture diagram, quickstart guide, evaluation table, licence info. All `.py` files have module-level docstrings |
338
+ | 24 | **Final verification + submission artefacts.** End-to-end smoke test of live demo. Verify: GitHub repo is public, HF Space is live, HF model repo with LoRA adapter is public. Create `submission_checklist.md` confirming all competition requirements met. | Manual testing, checklist | All 3 submission artefacts publicly accessible | GitHub repo public ✓, HF Space live ✓, HF model repo public ✓. Demo runs without errors from external device |
339
+
340
+ ### 6.2 Complete Technology Stack
341
+
342
+ | Layer | Component | Specific Technology | Version / Detail |
343
+ |---|---|---|---|
344
+ | **Infrastructure** | GPU compute | Hugging Face Spaces | A100 40GB GPU (ZeroGPU or dedicated; fallback: A10G 24GB) |
345
+ | | Container base | Docker | `nvidia/cuda:12.4.1-runtime-ubuntu22.04` |
346
+ | | Runtime | Python | 3.11.x |
347
+ | **Backend** | Web framework | FastAPI | 0.109.x, served via `uvicorn` 0.27.x |
348
+ | | Async | Python asyncio | Built-in, for concurrent MedASR + EHR Agent execution |
349
+ | | WebSocket | `websockets` | 12.x, for real-time audio streaming and transcript return |
350
+ | **Frontend** | UI framework | Gradio | 5.x (latest), served within HF Space |
351
+ | | Audio capture | Browser MediaRecorder API | WebM/Opus codec → server-side conversion to 16kHz WAV |
352
+ | | Styling | Custom CSS | NHS Design System colour palette + Gradio theming API |
353
+ | **Audio processing** | Format conversion | `ffmpeg` | 7.x, for WebM→WAV transcoding |
354
+ | | Audio manipulation | `pydub` | 0.25.x, for resampling and channel conversion |
355
+ | | Audio analysis | `librosa` | 0.10.x, for waveform loading and preprocessing |
356
+ | **ASR model** | MedASR | `google/medasr` | 105M params, Conformer-based, via `transformers` AutoModelForSpeechSeq2Seq |
357
+ | **EHR Agent model** | MedGemma 1.5 4B IT | `google/medgemma-1.5-4b-it` | 4B params, 4-bit quantised via `bitsandbytes`, agent via `langgraph` 0.2.x |
358
+ | **Document generation model** | MedGemma 27B Text IT | `google/medgemma-27b-text-it` | 27B params, 4-bit NF4 quantised via `bitsandbytes` 0.44.x, double quantisation, `compute_dtype=bfloat16` |
359
+ | **ML framework** | Core | PyTorch | 2.4.x with CUDA 12.4 |
360
+ | | Model loading | `transformers` | 4.47.x (HuggingFace) |
361
+ | | Quantisation | `bitsandbytes` | 0.44.x (NF4 quantisation) |
362
+ | | Acceleration | `accelerate` | 1.2.x (device mapping) |
363
+ | **Fine-tuning** | LoRA framework | `peft` | 0.13.x (Parameter-Efficient Fine-Tuning) |
364
+ | | Training | `trl` (SFTTrainer) | 0.12.x |
365
+ | | Dataset loading | `datasets` | 3.2.x (HuggingFace) |
366
+ | | Experiment tracking | `wandb` | 0.18.x (Weights & Biases) |
367
+ | **FHIR** | Server | HAPI FHIR | v7.4, Docker container, FHIR R4 |
368
+ | | Client | `httpx` | 0.27.x (async HTTP for FHIR REST API calls) |
369
+ | | Data generation | Synthea | v3.3.0, configured with UK module (names, NHS numbers, mmol/L units, BNF drugs) |
370
+ | **Prompt engineering** | Template engine | `jinja2` | 3.1.x, for structured prompt assembly |
371
+ | **Evaluation** | WER | `jiwer` | 3.0.x |
372
+ | | Text similarity | `rouge_score`, `sacrebleu` | Latest stable |
373
+ | | ASR baseline | `openai-whisper` | large-v3 (for comparison only) |
374
+ | **Version control** | Repository | GitHub | Public repository, Apache 2.0 licence |
375
+ | **Model hosting** | LoRA adapter | Hugging Face Hub | Public model repo, tracing to `google/medgemma-27b-text-it` |
376
+ | **Document export** | PDF generation | `reportlab` | 4.2.x, for clinic letter PDF export |
377
+
378
+ ### 6.3 Fine-Tuning Strategy
379
+
380
+ **Model 1: MedGemma 27B Text-Only — NHS Clinical Letter Generation (Primary fine-tuning target)**
381
+
382
+ - **Objective:** Adapt MedGemma 27B from general medical Q&A to structured NHS clinical document generation from (transcript, FHIR context) input pairs.
383
+
384
+ - **Data generation:**
385
+ 1. Select 50 diverse Synthea patients spanning: diabetes, COPD, heart failure, CKD, hypertension, cancer follow-up, mental health, orthopaedic, paediatric, obstetric scenarios.
386
+ 2. For each patient, generate 5 training triplets using Claude 3.5 Sonnet API (total: 250 training + 50 held-out test):
387
+ - **Input 1:** Simulated clinician-patient conversation transcript (~500–1,500 words, naturalistic dialogue including interruptions, repetition, non-medical small talk)
388
+ - **Input 2:** FHIR context JSON (extracted from Synthea patient data: demographics, conditions, medications, labs, allergies)
389
+ - **Output:** Reference NHS-format clinic letter following Royal College of Physicians guidelines: date, addressee (GP), patient demographics, reason for consultation, history of presenting complaint, relevant PMH, examination findings, investigation results (with actual values from FHIR context), assessment, plan, follow-up.
390
+ 3. Manually review 40 triplets (20 training, 20 test) for clinical accuracy, correct FHIR value usage, and NHS format compliance. Revise and re-generate any with errors.
391
+
392
+ - **Dataset format:** JSONL, one example per line. Each line: `{"messages": [{"role": "system", "content": "<NHS_LETTER_INSTRUCTIONS>"}, {"role": "user", "content": "<TRANSCRIPT>\n\n<FHIR_CONTEXT>"}, {"role": "assistant", "content": "<REFERENCE_LETTER>"}]}`
393
+
394
+ - **Method:** QLoRA (Quantised Low-Rank Adaptation)
395
+ - **Base model:** `google/medgemma-27b-text-it`, loaded in 4-bit NF4 quantisation with double quantisation
396
+ - **LoRA targets:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` (all attention + MLP projections)
397
+ - **LoRA rank:** 16
398
+ - **LoRA alpha:** 32 (scaling factor = alpha/rank = 2.0)
399
+ - **LoRA dropout:** 0.05
400
+ - **Optimizer:** paged_adamw_8bit
401
+ - **Learning rate:** 2e-4
402
+ - **LR scheduler:** cosine with warmup
403
+ - **Warmup ratio:** 0.03 (≈2 warmup steps for 250 examples × 3 epochs / batch)
404
+ - **Per-device batch size:** 2
405
+ - **Gradient accumulation steps:** 8 (effective batch size: 16)
406
+ - **Max sequence length:** 4,096 tokens
407
+ - **Epochs:** 3
408
+ - **Mixed precision:** bf16
409
+
410
+ - **Hardware:** Single A100 40GB GPU (same as inference; LoRA adapters fit in remaining VRAM after 4-bit base model).
411
+
412
+ - **Duration estimate:** ~2.5 hours for 250 examples × 3 epochs at batch size 2 with gradient accumulation of 8, max_seq_length 4096 on A100 40GB. Based on comparable LoRA fine-tuning benchmarks for 27B models.
413
+
414
+ - **Output:** LoRA adapter checkpoint (~200–500MB), uploaded to Hugging Face Hub as a public model tracing to `google/medgemma-27b-text-it`.
415
+
416
+ **Model 2: MedASR — NHS-Accented Medical Speech (Stretch goal, hours 19–20 if evaluation results indicate need)**
417
+
418
+ - **Rationale:** Only pursued if hour-17 evaluation shows MedASR WER on synthetic NHS-accented audio exceeds 10%. If base MedASR WER is acceptable (<10%), document this as a production requirement and proceed with base model.
419
+ - **Data:** 100–500 labelled medical utterances with UK/international accents. For the demo, these can be self-recorded or sourced from open medical speech datasets.
420
+ - **Method:** LoRA fine-tuning following Google's published MedASR fine-tuning notebook. LoRA rank 8, learning rate 1e-4, 5 epochs.
421
+ - **Duration:** ~1 hour on A100.
422
+ - **Fallback (if not pursued):** Document base MedASR performance on NHS-accented audio in the evaluation report. Note this as a production fine-tuning requirement.
423
+
424
+ ### 6.4 Model Performance Evaluation Plan
425
+
426
+ | Model | Metric | Test Set | Baseline | Target | Evaluation Method |
427
+ |---|---|---|---|---|---|
428
+ | **MedASR** | Word Error Rate (WER) | 20 medical dictation clips (synthetic, clear audio, US-accent) + 10 ambient conversation clips (synthetic, multi-speaker, UK-accent) | Whisper large-v3 WER on identical audio | MedASR WER < Whisper WER on dictation (expect ≥58% relative improvement per published benchmarks); MedASR WER < 10% absolute on ambient clips | `jiwer` library; manual ground-truth transcripts for each clip |
429
+ | **MedGemma 4B (EHR Agent)** | Fact recall, precision, hallucination rate | 20 synthetic patient records with manually annotated gold-standard context summaries (each containing 15–25 clinically relevant facts) | Naive FHIR dump (return all resources without summarisation or filtering) | Fact recall >85%, precision >90%, hallucination rate <10% | Custom script comparing agent JSON output to gold-standard annotations; 2 reviewers for 10 disagreement cases |
430
+ | **MedGemma 27B (Document Gen, base)** | BLEU, ROUGE-L | 50 held-out (transcript, context, letter) triplets | N/A (this is the baseline) | Establish baseline scores | `sacrebleu`, `rouge_score` |
431
+ | **MedGemma 27B (Document Gen, fine-tuned)** | BLEU, ROUGE-L, clinical accuracy | Same 50 held-out triplets | Base MedGemma 27B scores (from row above) | BLEU improvement ≥5 points; ROUGE-L improvement ≥3 points; manual review confirms improved NHS format compliance in ≥8/10 sampled letters | Automated metrics + manual clinical review of 10 randomly sampled letters by team clinician |
432
+ | **End-to-end pipeline** | Latency (time from "End Consultation" to draft letter display) | 5 full demo scenarios | Manual documentation time (~15 min per clinician survey) | <60 seconds from click to rendered letter | Stopwatch measurement across 5 trials; mean and max recorded |
433
+ | **End-to-end pipeline** | Clinical coherence (manual) | 5 generated letters from demo scenarios | N/A (qualitative) | All 5 letters are clinically coherent, factually consistent with FHIR data, and follow NHS letter structure | Manual review by team clinician using a 5-point rubric: format compliance, clinical accuracy, FHIR value correctness, appropriate plan, readability |
434
+
435
+ ---
436
+
437
+ ## SECTION 7: DEPLOYMENT & DEMO SPECIFICATION
438
+
439
+ ### 7.1 Live Demo Scenario
440
+
441
+ The demo shows a complete consultation with a synthetic NHS patient, designed to demonstrate every stage of Clarke's agentic pipeline in under 3 minutes. A judge watching the video will see exactly this:
442
+
443
+ **Screen 1 — Dashboard (0:00–0:15)**
444
+
445
+ Clarke opens in a web browser showing a mock clinic list for "Dr. Sarah Chen, Diabetes & Endocrinology." Five patients are listed with appointment times and one-line summaries. The first patient is highlighted:
446
+
447
+ > **Mrs. Margaret Thompson** | 67F | 14:00 | Follow-up — Type 2 Diabetes, rising HbA1c
448
+
449
+ The clinician (or narrator) clicks on Mrs. Thompson's name.
450
+
451
+ **Screen 2 — Patient Context Loading (0:15–0:25)**
452
+
453
+ A brief loading animation plays (3–5 seconds) as Clarke's EHR Agent (MedGemma 4B) queries the FHIR server. The Patient Context panel on the left populates with structured data:
454
+
455
+ - **Diagnoses:** Type 2 Diabetes Mellitus (2019), Hypertension (2017), CKD Stage 3a (2023)
456
+ - **Medications:** Metformin 1g BD, Ramipril 5mg OD, Atorvastatin 20mg OD
457
+ - **Allergies:** ⚠ Penicillin (anaphylaxis)
458
+ - **Recent Labs:** HbA1c **55** mmol/mol ↑ (was 48 in Jun 2025) · eGFR **52** mL/min ↓ (was 58) · Creatinine **98** μmol/L
459
+ - **Flag:** "⚠ HbA1c rising trend over 6 months — consider treatment escalation"
460
+
461
+ **Screen 3 — Live Consultation (0:25–1:30)**
462
+
463
+ The clinician clicks **"Start Consultation."** A red recording indicator appears. A pre-recorded synthetic audio clip plays (~60 seconds, ~250 words):
464
+
465
+ > Doctor: "Hello Mrs. Thompson, good to see you again. How have you been since we last met?"
466
+ > Patient: "Not too bad doctor, but I've been a bit more tired lately and I'm worried about my sugar levels..."
467
+ > [Conversation covers: fatigue symptoms, dietary adherence, discussion of HbA1c rising from 48 to 55, explanation that metformin alone isn't enough, shared decision to start gliclazide, discussion of hypoglycaemia risk, plan for repeat HbA1c in 3 months]
468
+
469
+ The Live Transcript panel shows MedASR's real-time text appearing as the audio plays, demonstrating the streaming transcription.
470
+
471
+ **Screen 4 — Document Generation (1:30–1:50)**
472
+
473
+ The clinician clicks **"End Consultation."** The recording stops. A progress indicator appears: "Generating clinical letter..." (15–20 seconds). During this time, a subtle animation shows the three pipeline stages: "Transcribing... → Retrieving context... → Drafting letter..."
474
+
475
+ **Screen 5 — Draft Clinic Letter (1:50–2:30)**
476
+
477
+ A structured NHS clinic letter appears in the centre panel:
478
+
479
+ > **[NHS Trust Header]**
480
+ > **Date:** 13 February 2026
481
+ > **NHS Number:** 943 476 5829
482
+ >
483
+ > Dr. R. Patel
484
+ > Riverside Medical Centre
485
+ > 45 High Street, London SE1 2AB
486
+ >
487
+ > **Re: Mrs. Margaret Thompson, DOB 14/03/1958**
488
+ >
489
+ > Dear Dr. Patel,
490
+ >
491
+ > Thank you for asking me to review Mrs. Thompson in the Diabetes clinic today.
492
+ >
493
+ > **History of presenting complaint:** Mrs. Thompson reports increasing fatigue over the past 2–3 months. She has been adherent to dietary advice but has found it difficult to reduce her carbohydrate intake further. She denies polyuria, polydipsia, or weight loss.
494
+ >
495
+ > **Investigation results:** Her HbA1c has risen from **48 mmol/mol** (June 2025) to **55 mmol/mol** (January 2026), indicating worsening glycaemic control despite metformin monotherapy. Her renal function shows eGFR **52 mL/min** (previously 58), consistent with stable CKD Stage 3a. Creatinine is **98 μmol/L**.
496
+ >
497
+ > **Assessment and plan:** Given the rising HbA1c on maximum-dose metformin, I have discussed treatment escalation with Mrs. Thompson and we have agreed to commence **gliclazide 40mg once daily**. I have counselled her regarding the risk of hypoglycaemia and provided written patient information. Please arrange a repeat HbA1c in **3 months** and review her renal function at that time.
498
+ >
499
+ > **Current medications:** Metformin 1g BD, Gliclazide 40mg OD (NEW), Ramipril 5mg OD, Atorvastatin 20mg OD.
500
+ >
501
+ > Yours sincerely,
502
+ > Dr. S. Chen, Consultant Diabetologist
503
+
504
+ The narrator points out: (a) investigation values are hyperlinked to their FHIR source records, (b) the letter includes both conversation content AND FHIR-sourced data, (c) the allergy is preserved.
505
+
506
+ **Screen 6 — Edit and Sign Off (2:30–2:50)**
507
+
508
+ The clinician edits one line (e.g., adding "She is tolerating metformin well with no GI side effects" — demonstrating inline editing). Then clicks **"Sign Off."** The document status changes to "✓ Approved." Export buttons appear: PDF, Copy to Clipboard, FHIR DocumentReference.
509
+
510
+ **Screen 7 — Closing (2:50–3:00)**
511
+
512
+ Brief closing card showing: "Clarke: 3 HAI-DEF models. 1 pipeline. Zero patient data leaves the building." with the project URL.
513
+
514
+ **Synthetic data used:** 50 Synthea-generated UK-style patients loaded into HAPI FHIR server. 3 pre-recorded audio clips for demo scenarios (Mrs. Thompson — diabetes; Mr. Okafor — chest pain follow-up; Ms. Patel — asthma review). Audio clips are 60–90 seconds, recorded in clear studio conditions at 16kHz WAV.
515
+
516
+ ### 7.2 Top 5 Deployment Challenges and Mitigations
517
+
518
+ | # | Challenge | Specific Mitigation | Evidence / Precedent |
519
+ |---|---|---|---|
520
+ | 1 | **GPU hardware cost for NHS trusts** | MedGemma 4B runs on a single consumer GPU (RTX 4060 8GB, ~£300). MedGemma 27B quantised to 4-bit runs on a single A100 40GB or two A10G 24GB GPUs. NHS trusts already procure similar hardware for PACS imaging workstations. Capital cost: £5,000–£15,000 per trust (one-time), versus >£2M/year for Heidi Health licensing across a trust's doctors at $99/clinician/month. | NHS PACS infrastructure includes GPU workstations. NHS Digital recommends GPU-enabled infrastructure for AI workloads. |
521
+ | 2 | **NHS accent diversity and MedASR accuracy** | Fine-tune MedASR with LoRA on NHS-accented audio. Google's published fine-tuning notebook provides the methodology. MedASR's Conformer architecture handles accent variation better than RNN-based ASRs. For production: each trust collects 100–500 de-identified audio samples from its own clinicians for local LoRA fine-tuning (~1 hour compute). | MedASR was trained on ~5,000 hours of diverse physician dictations. LoRA fine-tuning is computationally cheap (minutes to hours). |
522
+ | 3 | **FHIR API availability across NHS trusts** | Demo uses HAPI FHIR with Synthea data. Production: NHS England's Interoperability Standards mandate FHIR R4 API exposure. EMIS, SystmOne, and Cerner/Oracle Health increasingly offer FHIR endpoints. Clarke degrades gracefully — if no FHIR API is available, it operates in "documentation-only" mode (ambient scribe without EHR context), still valuable. | NHS England's NHS England Digital Interoperability Programme (2024–2025) mandates FHIR for data sharing. The eDischarge Summary Standard (PRSB, updated January 2026) mandates structured discharge data transfer. |
523
+ | 4 | **Clinical safety of AI-generated documents** | Clarke is a drafting tool, not an autonomous documentation system. Every document requires mandatory clinician review and sign-off before entering the medical record. FHIR-sourced values are hyperlinked to source records for verification. Discrepancy flags (e.g., allergy mismatch between conversation and record) are prominently displayed in red. | Google's HAI-DEF Terms of Use explicitly state model outputs require independent professional verification. Mass General Brigham's ambient AI programme (3,000+ providers) uses the same review-before-filing workflow. |
524
+ | 5 | **Information governance and regulatory status** | All processing is local — no external data transmission. Open-source code is fully auditable by trust IG teams. Clarke is positioned as a clinical decision support tool (not a regulated medical device), consistent with MHRA guidance on software as a medical device. A Data Protection Impact Assessment (DPIA) would be completed per trust. | NHS England published guidance on AI scribes in health settings (April 2025). Open-source audit trail addresses NHS IG requirement for transparency. |
525
+
526
+ ### 7.3 Production Deployment Path
527
+
528
+ **Phase 1 — Single-trust pilot (months 0–6)**
529
+
530
+ - **Setup:** Deploy Clarke on a dedicated GPU server (A100 or 2×A10G) within the trust's on-premise data centre or private cloud (e.g., NHS-approved Azure/AWS tenancy). Dockerised deployment using `docker-compose` with three services: Clarke application, HAPI FHIR proxy, GPU model server.
531
+ - **Scope:** 10–20 clinicians across 2–3 specialties (e.g., diabetes outpatients, general medicine ward round, GP training practice). Selected for diversity of documentation types.
532
+ - **Integration:** Connect FHIR proxy to the trust's EHR (EMIS/SystmOne/Cerner) via existing FHIR R4 endpoints. Read-only access — Clarke does not write to the EHR in Phase 1.
533
+ - **Measurement:** Prospective time-motion study comparing documentation time with vs. without Clarke. Pre-post burnout survey using validated instruments (MBI or Mini-Z). Letter quality audit: completeness, accuracy, NHS format compliance. Clinician satisfaction (System Usability Scale).
534
+ - **Governance:** DPIA completed. Information governance approval from trust Caldicott Guardian. Clinical safety case per DCB0129 standard. Ethics approval if results are intended for publication.
535
+
536
+ **Phase 2 — Trust-wide rollout (months 6–12)**
537
+
538
+ - Expand to all clinicians in the pilot trust. Fine-tune MedASR on the trust's collected audio data for accent optimisation. Integrate EHR write-back (with appropriate clinical safety controls) so signed-off letters can be filed directly into the patient record.
539
+ - Begin second-trust pilot to demonstrate transferability.
540
+
541
+ **Phase 3 — Multi-trust deployment (months 12–24)**
542
+
543
+ - Package Clarke as a `docker-compose` deployment kit with comprehensive setup documentation.
544
+ - Publish on NHS App Store / Digital Marketplace for trust procurement teams.
545
+ - Work with NHS England's AI and Digital Regulation programme for national endorsement.
546
+ - Share specialty-specific LoRA adapters (e.g., cardiology, psychiatry, paediatrics) on Hugging Face Hub.
547
+
548
+ **Phase 4 — Ongoing open-source development**
549
+
550
+ - Community-driven improvement via GitHub contributions. Trust-specific LoRA adapters (accent, specialty, local conventions) shared on Hugging Face Hub. Regular model updates as Google releases new MedGemma and MedASR versions.
551
+
552
+ ---
553
+
554
+ ## SECTION 8: REFERENCES
555
+
556
+ 1. Arab S, Chhatwal K, Hargreaves T, et al. Time Allocation in Clinical Training (TACT): national study reveals Resident Doctors spend four hours on admin for every hour with patients. *QJM: An International Journal of Medicine*. 2025;118(10):753. doi:10.1093/qjmed/hcaf141.
557
+
558
+ 2. Olson KD, Meeker D, Troup M, et al. Use of Ambient AI Scribes to Reduce Administrative Burden and Professional Burnout. *JAMA Network Open*. 2025;8(10):e2534976. doi:10.1001/jamanetworkopen.2025.34976.
559
+
560
+ 3. Pearlman K, Wan W, Shah S, Laiteerapong N. Use of an AI Scribe and Electronic Health Record Efficiency. *JAMA Network Open*. 2025;8(10):e2537000. doi:10.1001/jamanetworkopen.2025.37000.
561
+
562
+ 4. You JG, Landman A, Ting DY, et al. Ambient Documentation Technology in Clinician Experience of Documentation Burden and Burnout. *JAMA Network Open*. 2025. doi:10.1001/jamanetworkopen.2025.28056.
563
+
564
+ 5. Moura LMVR, Mishuris R, Metlay JP, et al. Hybrid Ambient Clinical Documentation and Physician Performance: Work Outside of Work, Documentation Delay, and Financial Productivity. *Journal of General Internal Medicine*. 2025. doi:10.1007/s11606-025-09979-5.
565
+
566
+ 6. NHS Digital. NHS Workforce Statistics — August 2025. Published November 2025. https://digital.nhs.uk/data-and-information/publications/statistical/nhs-workforce-statistics/august-2025.
567
+
568
+ 7. NHS Digital. General Practice Workforce — 31 December 2025. https://digital.nhs.uk/data-and-information/publications/statistical/general-and-personal-medical-services/31-december-2025.
569
+
570
+ 8. BMA. NHS Backlog Data Analysis. Accessed February 2026. https://www.bma.org.uk/advice-and-support/nhs-delivery-and-workforce/pressures/nhs-backlog-data-analysis.
571
+
572
+ 9. BMA. Medical Staffing in the NHS — Data Analysis. Accessed February 2026. https://www.bma.org.uk/advice-and-support/nhs-delivery-and-workforce/workforce/medical-staffing-in-the-nhs.
573
+
574
+ 10. GMC. National Training Survey 2025. Published July 2025. Referenced via NHS Employers: https://www.nhsemployers.org/news/gmc-national-training-survey-2025.
575
+
576
+ 11. Google Research. Next generation medical image interpretation with MedGemma 1.5 and medical speech to text with MedASR. Published January 2026. https://research.google/blog/next-generation-medical-image-interpretation-with-medgemma-15-and-medical-speech-to-text-with-medasr/.
577
+
578
+ 12. Google. MedASR Model Card. Hugging Face. https://huggingface.co/google/medasr.
579
+
580
+ 13. Google. MedASR Developer Documentation. https://developers.google.com/health-ai-developer-foundations/medasr.
581
+
582
+ 14. Google. MedGemma Technical Report. arXiv:2507.05201.
583
+
584
+ 15. Google. Health AI Developer Foundations (HAI-DEF) Technical Report. arXiv:2411.15128.
585
+
586
+ 16. King's Fund. Waiting Times for Elective (Non-Urgent) Treatment: Referral to Treatment (RTT). Updated December 2025. https://www.kingsfund.org.uk/insight-and-analysis/data-and-charts/waiting-times-non-urgent-treatment.
587
+
588
+ 17. MedGamma Clinician Survey. 47 responses from NHS and healthcare clinicians. Collected January 16 – February 8, 2026. Internal document.
589
+
590
+ 18. TechFundingNews. Heidi raises $65M to scale its AI scribe across global health systems. October 2025. https://techfundingnews.com/heidi-ai-care-partner-65m-series-b-global-expansion/.
591
+
592
+ 19. Mass General Brigham. Ambient Documentation Technologies Reduce Physician Burnout and Restore 'Joy' in Medicine. Press release, August 2025. https://www.massgeneralbrigham.org/en/about/newsroom/press-releases/ambient-documentation-technologies-reduce-physician-burnout.
593
+
594
+ 20. Medscape UK. Medscape UK Survey: Burnout Hits Doctors Amid NHS Pressures. June 2025.
595
+
596
+ 21. iatroX. The rise of AI medical scribes in UK primary care. August 2025. https://www.iatrox.com/blog/ai-medical-scribes-uk-gp-guide-tortus-accurx-heidi.
597
+
598
+ 22. Burnett S, Franklin BD, Moorthy K, et al. Missing Clinical Information in NHS hospital outpatient clinics: prevalence, causes and effects on patient care. *BMC Health Services Research*. 2011;11:114. doi:10.1186/1472-6963-11-114.
599
+
600
+ 23. NHS Resolution. Annual Report and Accounts 2024 to 2025. Published July 2025. https://www.gov.uk/government/publications/nhs-resolution-annual-report-and-accounts-2024-to-2025.
601
+
602
+ 24. NHS England. NHS Patient Safety Strategy — Progress Update — April 2025. https://www.england.nhs.uk/patient-safety/the-nhs-patient-safety-strategy/.
603
+
604
+ 25. PRSB. eDischarge Summary Standard. Updated January 2026. https://theprsb.org/standards/edischargesummary/.
605
+
606
+ 26. NHS Providers. NHS Digital Workforce Statistics — November 2025. https://nhsproviders.org/resources/nhs-digital-workforce-statistics-november-2025.
607
+
608
+ ---
609
+
610
+ *Document prepared for Clarke PRD generation — Version 2.0 — February 2026*
Dockerfile ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM nvidia/cuda:12.4.1-runtime-ubuntu22.04
2
+
3
+ RUN apt-get update && apt-get install -y \
4
+ python3.11 python3.11-venv python3-pip \
5
+ ffmpeg curl wget git && \
6
+ rm -rf /var/lib/apt/lists/*
7
+
8
+ RUN ln -s /usr/bin/python3.11 /usr/bin/python
9
+
10
+ WORKDIR /app
11
+ COPY requirements.txt .
12
+ RUN pip install --break-system-packages --no-cache-dir -r requirements.txt
13
+
14
+ COPY . .
15
+
16
+ # Load FHIR data and start application
17
+ RUN chmod +x scripts/start.sh
18
+ EXPOSE 7860
19
+ CMD ["scripts/start.sh"]
LICENSE ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
MedGemma High-Level Context.md ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Competition Overview
2
+
3
+ Google has released open-weight models designed to help developers more efficiently create novel healthcare and life science applications. Many clinical environments can’t rely on large, closed models that require constant internet access or centralized infrastructure. They need adaptable, privacy-focused tools that can run anywhere care is delivered.
4
+
5
+ MedGemma and the rest of the HAI-DEF collection gives developers a starting point for building powerful tools while allowing them full control over the models and associated infrastructure.
6
+
7
+ The competition requires us to use these models to build full fledged demonstration applications that can enhance healthcare. Examples include apps to streamline workflows, support patient communication, or facilitate diagnostics.
8
+
9
+ ## Minimum requirements
10
+
11
+ To be considered a valid contribution, your submission should include:
12
+
13
+ * a high-quality writeup describing use of a specific HAI-DEF model,
14
+ * associated reproducible code for your initial results, and
15
+ * a video for judging.
16
+
17
+ Your complete submission consists of a single package containing your video (3 minutes or less) and write-up (3 pages or less). This single entry can be submitted to the main competition track, and one special technology award, so separate submissions are not required. Read the section *Submission Instructions* for more details. Please follow the provided write-up template and refer to the judging criteria for all content requirements.
18
+
19
+ ## Judging Criteria
20
+
21
+ 1. Effective use of HAI-DEF models (weighting \= 20%)
22
+ 1. Are HAI-DEF models used appropriately?
23
+ 2. Are HAI-DEF models used to their fullest potential?
24
+ 3. Why is this solution and its use of HAI-DEF models superior to other ones?
25
+ 4. At least one of HAI-DEF models such as MedGemma is mandatory
26
+ 2. Problem Domain (weighting \= 15%)
27
+ 1. How important is this problem to solve?
28
+ 2. How plausible is it that AI is the right solution?
29
+ 3. Is your storytelling unbeliably captivating and inspiring?
30
+ 4. Is your problem definition clear?
31
+ 5. Is the unmet need true and clear?
32
+ 6. Is the problem of large magnitude in severity and ubiquity?
33
+ 7. Who is the target user?
34
+ 8. Does the solution genuinely improve the user journey?
35
+ 3. Impact potential (weighting \= 15%)
36
+ 1. If the solution works, what impact would it have?
37
+ 2. Have you clearly and honestly articulated the real and/or anticipated impact of the application within the given problem domain?
38
+ 3. Have you honestly and accurately calculated your estimates, and what are those estimates?
39
+ 4. Product feasibility (weighting \= 20%)
40
+ 1. Is the technical solution clearly feasible?
41
+ 2. Have you accurately and completely detailed model fine-tuning, model’s performance analysis, user-facing application stack, deployment challenges and how you plan on overcoming them?
42
+ 3. Have you communicated exactly how a product might be used in practice?
43
+ 5. Execution and communication (weighting \= 30%)
44
+ 1. Is the quality of your project world-class?
45
+ 2. Is the communication of your work concise (using as few words as possible to most completely and comprehensibly communicate information)?
46
+ 3. Does the submission package follow the provided template, including a video demo and a write-up with links to source material?
47
+ 1. Do you have a public interactive live demo app?
48
+ 2. Open-weight Hugging Face model tracing to a HAI-DEF model?
49
+ 4. Is your communication in the video of maximal clarity?
50
+ 5. Is the video polished and stylish?
51
+ 6. Does the video effectively communicate and demonstrate your application?
52
+ 7. Is your technical write-up easy-to-read?
53
+ 8. Is your source code organised, annotated and reusable?
54
+ 9. Do you have a cohesive and compelling narrative across all submitted materials that effectivley articulates how you meet the rest of the judging criteria?
55
+
56
+ ## Dataset Description
57
+
58
+ Welcome to the MedGemma Impact Challenge
59
+
60
+ This is a Hackathon with no provided dataset.
61
+
62
+ Please refer to the following resources, and note that use of HAI-DEF and MedGemma are subject to the [HAI-DEF Terms of Use](https://developers.google.com/health-ai-developer-foundations/terms):
63
+
64
+ * Hugging Face model collections:
65
+ * [Complete HAI-DEF collection](https://huggingface.co/collections/google/health-ai-developer-foundations-hai-def)
66
+ * [MedGemma collection](https://huggingface.co/collections/google/medgemma-release)
67
+ * [Developer forum](https://discuss.ai.google.dev/c/hai-def)
68
+ * [HAI-DEF concept apps for inspiration](https://huggingface.co/collections/google/hai-def-concept-apps)
69
+ * [HAI-DEF main site](https://goo.gle/hai-def)
70
+
71
+ # Goal
72
+
73
+ 1. Win the Main Track 1st place prize ($30,000)
74
+ 1. this prizes are awarded to the best overall project that demonstrate exceptional vision, technical execution, and potential for real-world impact.
75
+ 2. Wins one of the special track prizes ($$5000)
76
+ 1. Agentic Workflow Prize \- It is awarded for the project that most effectively reimagines a complex workflow by deploying HAI-DEF models as intelligent agents or callable tools. The winning solution will demonstrate a significant overhaul of a challenging process, showcasing the power of agentic AI to improve efficiency and outcomes.
77
+ 2. Novel Task Prize \- Awarded for the most impressive fine-tuned model that successfully adapts a HAI-DEF model to perform a useful task for which it was not originally trained on pre-release.
78
+ 3. The Edge AI Prize \- This prize is awarded to the most impressive solution that brings AI out of the cloud and into the field. It will be awarded to the team that best adapts a HAI-DEF model to run effectively on a local device like a mobile phone, portable scanner, lab instrument, or other edge hardware.
79
+
80
+ # Constraints
81
+
82
+ 1. Complete the product in 3 days (using an AI tool like Claude or Codex to do this successfully)
83
+ * Assuming I have 8 hours per day to complete the product (24 hours in total)
84
+ 2. Writeup structure
85
+ * \#\#\# Project name
86
+ * \[A concise name for your project.\]
87
+ *
88
+ * \#\#\# Your team
89
+ * \[Name your team members, their speciality and the role they played.\]
90
+ *
91
+ * \#\#\# Problem statement
92
+ * \[Your answer to the “Problem domain” & “Impact potential” criteria\]
93
+ *
94
+ * \#\#\# Overall solution:
95
+ * \[Your answer to “Effective use of HAI-DEF models” criterion\]
96
+ *
97
+ * \#\#\# Technical details
98
+ * \[Your answer to “Product feasibility” criterion\]
99
+
100
+ # Target Timeline
101
+
102
+ **Thursday 12 February**
103
+
104
+ 1. Conjecture the best possible ideas to maximise chance of winning the competition within the constraints and pick the best one
105
+ 1. Use Claude Opus 4.6, ChatGPT 5.2 Thinking, Gemini 3 Pro, and Grok Thinking to conjecture ideas and pick the best one
106
+ 2. Use OpenClaw to compile all the useful information as context (saved to a Google Docs titled “Additional Competition Context”) to use to create PRDs (Project Requirement Documents) along with “MedGemma High-Level Context.md”
107
+ 3. Use Claude Opus 4.6 Extended to create first-draft version of the PRDs
108
+ 4. Review and edit the PRDs as needed
109
+ 5. Feed preliminary PRDs to Lovable PRD Generator along with “MedGemma High-Level Context.md”, “Additional Competition [Context.md](http://Context.md)”, and “Clarke\_Product\_Specifcation\_V1.md” and any other requests/context
110
+ 6. Use Lovable PRD Generator to create new and improved PRDs
111
+ 7. Select, review and edit the final PRDs as needed
112
+ 8. Complete planning and preparation of guiding Project-Requirements-Documents (PRDs)
113
+ 1. clarke\_PRD\_masterplan.md
114
+ 1. This file should provide a complete, high-level overview of what we’re building, why we’re building it, the goals, and the constraints
115
+ 1. What are we building?
116
+ 2. Why are we building it?
117
+ 3. Why is it important?
118
+ 4. What are the goals?
119
+ 5. What are the constraints?
120
+ 6. What must be fulfilled to guarantee a successful outcome?
121
+ 2. clarke\_PRD\_implementation.md
122
+ 1. This file should outline the order in which we should build and integrate components of the product to ensure there is a clean, understandable sequence
123
+ 1. What steps must we take to ensure a successful outcome?
124
+ 2. What is the optimal order for us to take those steps to guarantee a successful outcome with the constraints?
125
+ 3. clarke\_PRD\_design\_guidelines.md
126
+ 1. This file should detail the visual aesthetics of the product and how it should make the user feel when using it
127
+ 1. What should the product look like?
128
+ 2. What should the product feel like?
129
+ 3. Give examples using Mobbin or Dribbble or anything else about the aesthetics and vibe I want to cultivate
130
+ 4. clarke\_PRD\_userflow.md
131
+ 1. This file should detail the intended user experience and journey, optimising for ease of use and navigation
132
+ 1. What is the optimal the step-by-step description of the idea user navigation journey from very start to finish?
133
+ 5. clarke\_PRD\_technical\_spec.md
134
+ 1. **Folder/file structure** — the exact project directory tree Codex should create
135
+ 2. **Technology stack** — every framework, library, and version (e.g., Gradio 4.x, transformers 4.45, torch 2.x)
136
+ 3. **Data models/schemas** — what objects exist in the system (Patient, Consultation, Transcript, ClinicalDocument), their fields, types, and relationships
137
+ 4. **API contracts** — what each backend endpoint accepts and returns
138
+ 5. **Model serving specification** — how each HAI-DEF model is loaded, called, and what input/output format it expects
139
+ 6. **Synthetic data spec** — exact format of demo data (FHIR resources, sample transcripts, lab reports) so Codex can generate it
140
+ 6. clarke\_PRD\_tasks.md
141
+ 1. This file should very explicitly detail every single, step-by-step task that the AI tool (e.g. Codex) should execute on to culminate in a product that completely, without fail, achieves the goal within the constraints
142
+ 1. What are the specific tasks the AI should execute on to successfully build a world-class product and fulfil our goal?
143
+ 2. for each task, specify what "done" looks like. For example, "Task 3 is done when the /transcribe endpoint accepts a .wav file and returns JSON with a transcript field."
144
+ 7. clarke\_PRD\_rules.md
145
+ 1. This file should convey how the LLM/AI tool should behave, communicate and execute actions and correct errors to ensure maximal speed, accuracy, effectiveness, transparency and efficiency
146
+ 1. What information can I communicate to the AI to ensure the best possible comprehension and communication of information and execution of action?
147
+
148
+ **Friday 13 February to Sunday 15th February**
149
+
150
+ 1. Complete building of the product
151
+ 1. AI tool should first read and understand all the files. There should be some sort of test to check and ensure the AI has read and understood all the files before starting.
152
+ 2. The AI should then execute one task at a time (using tasks.md), making sure to explain everything and every step that it is doing in a way that a layman could understand as well as testing or creating a way to test that the task has indeed been completed successfully
153
+ 1. The AI should also correct any errors that arise in the execution
154
+
155
+ **Monday 16th February**
156
+
157
+ 1. Complete the 3-page write up for the competition submission
158
+ 1. The writeup should use the “Propose Writeup template”
159
+ 2. The writeup should ensure that it completely satisfies and exceeds the judging criteria to maximise chance of achieving the goal
160
+ 3. The writeup should be as high-level and easy-to-read as possible
161
+
162
+ **Tuesday 17th February to Saturday 21st February**
163
+
164
+ 1. Complete the 3-minute video
165
+ 1. The video will be the main, attention-grabbing medium of communication so should be optimised to exceed every aspect of the judging criteria
166
+
167
+ **Sunday 22nd February**
168
+
169
+ 1. Submit everything
170
+ 1. 3-page writeup
171
+ 2. Video (3 mins or less)
172
+ 3. Public code repository
173
+ 4. PUblic interactive live demo app
174
+ 5. Open-weight Hugging Face model tracing to a HAI-DEF model
README.md CHANGED
@@ -1,2 +1,257 @@
1
- # clarke
2
- Clarke — AI-powered ambient clinical documentation system for NHS clinicians
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Clarke
3
+ emoji: 🩺
4
+ colorFrom: blue
5
+ colorTo: yellow
6
+ sdk: docker
7
+ app_port: 7860
8
+ ---
9
+
10
+ # Clarke
11
+
12
+ **AI-powered NHS clinic letter generation: from consultation audio to structured clinical document in under 60 seconds.**
13
+
14
+ Clarke is an ambient clinical documentation system that converts doctor-patient audio consultations into structured NHS clinic letters. It coordinates three [HAI-DEF](https://goo.gle/hai-def) models as autonomous agents in a unified agentic pipeline: medical speech recognition, EHR context retrieval via FHIR, and context-enriched document generation.
15
+
16
+ Built for the [MedGemma Impact Challenge](https://www.kaggle.com/competitions/medgemma-impact-challenge) on Kaggle, targeting the **Agentic Workflow Prize**.
17
+
18
+ | Resource | Link |
19
+ |----------|------|
20
+ | Live demo | [yashvshetty-clarke.hf.space](https://yashvshetty-clarke.hf.space) |
21
+ | Source code | [github.com/yvs-tinker/clarke](https://github.com/yvs-tinker/clarke) |
22
+ | Evaluation report | [evaluation/EVALUATION.md](evaluation/EVALUATION.md) |
23
+ | LoRA adapter | [yashvshetty/clarke-medgemma-27b-lora](https://huggingface.co/yashvshetty/clarke-medgemma-27b-lora) |
24
+ | Demo video | *Coming soon* |
25
+
26
+ ---
27
+
28
+ ## Results
29
+
30
+ Clarke was evaluated across five NHS outpatient consultations spanning endocrine, cardiology, respiratory, heart failure, and mental health. Full methodology and per-patient breakdowns are in the [evaluation report](evaluation/EVALUATION.md).
31
+
32
+ | Component | Metric | Score |
33
+ |-----------|--------|-------|
34
+ | MedASR (speech-to-text) | Word Error Rate | 13.28% across 1,438 words |
35
+ | EHR Agent (record retrieval) | Fact Recall | 100% (70/70 facts retrieved) |
36
+ | EHR Agent | Precision | 98.6% (1 hallucination in 71 facts) |
37
+ | Document Generation (base) | BLEU-1 / ROUGE-L | 0.54 / 0.44 |
38
+ | Document Generation (after QLoRA) | BLEU-1 / ROUGE-L | **0.71 / 0.47** (+31% BLEU-1) |
39
+
40
+ QLoRA fine-tuning on just 5 gold-standard NHS clinic letters improved lexical accuracy by 31%, trained in 15 minutes on a single A100 GPU. The adapter is published at [`yashvshetty/clarke-medgemma-27b-lora`](https://huggingface.co/yashvshetty/clarke-medgemma-27b-lora).
41
+
42
+ ---
43
+
44
+ ## The Problem
45
+
46
+ NHS doctors spend 73% of their working time on non-patient-facing tasks, with only 17.9% on direct patient care (Arab et al., QJM 2025; TACT study, 137 doctors, 7 months). Documentation alone consumes approximately 15 minutes per patient encounter. Across a typical 10-12 patient clinic, that is 2.5-3 hours of writing per session (clinician survey, n=47). At the scale of 190,200 FTE doctors in England (NHS Digital, Aug 2025), the impact compounds: 7,248 unfilled medical vacancies, 7.31 million patients on the waiting list (BMA/King's Fund, Nov 2025), and 61% of trainees at moderate-to-high burnout risk (GMC NTS, 2025).
47
+
48
+ Ambient AI scribes have shown promise, reducing clinician burnout from 51.9% to 38.8% within 30 days (Olson et al., JAMA Netw Open 2025). But every existing solution (Heidi Health, DAX Copilot, Tortus) generates documents from conversation audio alone. None retrieves EHR context, meaning clinicians must still manually add lab values, medication lists, and allergy checks. All are cloud-dependent, closed-source, and costly: Heidi alone would cost an estimated 226M GBP/year at NHS scale.
49
+
50
+ Clarke closes both gaps: ambient documentation fused with intelligent EHR retrieval, fully open-source, designed for local deployment.
51
+
52
+ ### Clinician Validation
53
+
54
+ A 47-respondent clinician survey confirmed the problem. Respondents reported spending substantial time on documentation, identified clinical letter writing as the most time-consuming task, and expressed strong interest in AI-assisted documentation tools with EHR integration.
55
+
56
+ ---
57
+
58
+ ## Architecture
59
+
60
+ Clarke orchestrates three HAI-DEF models in a five-stage agentic workflow. Each model operates as an autonomous agent, making its own decisions about what to process, retrieve, or generate:
61
+
62
+ ```
63
+ ┌─────────────────────────────────────────────────────────────────┐
64
+ │ CLARKE PIPELINE │
65
+ │ │
66
+ │ ┌─────────────┐ │
67
+ │ │ Audio Input │ Upload or record consultation audio │
68
+ │ └──────┬──────┘ │
69
+ │ │ │
70
+ │ ▼ │
71
+ │ ┌─────────────────────────────────────────┐ │
72
+ │ │ 1. MedASR (google/medasr) │ │
73
+ │ │ Medical speech recognition agent │ │
74
+ │ │ Audio -> clinical transcript │ │
75
+ │ └──────┬──────────────────────────────────┘ │
76
+ │ │ │
77
+ │ ▼ │
78
+ │ ┌─────────────────────────────────────────┐ ┌────────────┐ │
79
+ │ │ 2. MedGemma 4B IT │<-->│ FHIR Server│ │
80
+ │ │ (google/medgemma-1.5-4b-it) │ │ Patient │ │
81
+ │ │ EHR context retrieval agent │ │ records │ │
82
+ │ │ Patient ID -> structured context │ └────────────┘ │
83
+ │ └──────┬──────────────────────────────────┘ │
84
+ │ │ transcript + patient context │
85
+ │ ▼ │
86
+ │ ┌─────────────────────────────────────────┐ │
87
+ │ │ 3. MedGemma 27B Text-IT │ │
88
+ │ │ (google/medgemma-27b-text-it) │ │
89
+ │ │ Document generation agent │ │
90
+ │ │ Transcript + context -> NHS letter │ │
91
+ │ └──────┬──────────────────────────────────┘ │
92
+ │ │ │
93
+ │ ▼ │
94
+ │ ┌─────────────┐ │
95
+ │ │ Draft Letter│ Clinician review, edit, and sign-off │
96
+ │ └─────────────┘ │
97
+ └─────────────────────────────────────────────────────────────────┘
98
+ ```
99
+
100
+ **Why three models, not one?** Each HAI-DEF model contributes a distinct capability that no single model can replicate. MedASR provides medical-domain speech recognition optimised for clinical terminology. MedGemma 4B understands FHIR resources natively, retrieving and synthesising relevant patient history. MedGemma 27B generates clinically accurate prose grounded in both the conversation and the medical record. The pipeline produces documents that reference actual lab values, include current medication lists, and cross-check for consistency. No conversation-only scribe can achieve this.
101
+
102
+ ---
103
+
104
+ ## Features
105
+
106
+ - **End-to-end ambient documentation** from patient selection through letter sign-off.
107
+ - **Three-model agentic pipeline** with MedASR, MedGemma 4B, and MedGemma 27B operating as coordinated agents.
108
+ - **FHIR-backed context enrichment** retrieving demographics, conditions, medications, lab results, allergies, and diagnostic reports.
109
+ - **Structured NHS clinic letter output** following standard clinical correspondence format.
110
+ - **QLoRA fine-tuning** with a published LoRA adapter achieving 31% BLEU-1 improvement.
111
+ - **Privacy-preserving architecture** designed for local deployment; no patient data leaves the hospital network.
112
+ - **Deterministic safety architecture** in the EHR agent ensures 100% fact recall by design.
113
+ - **Human-in-the-loop review** with mandatory clinician sign-off before any document is exported.
114
+ - **Mock-safe local development** enabling the full pipeline to run without GPU or gated model access.
115
+
116
+ ---
117
+
118
+ ## Models
119
+
120
+ Clarke uses three models from Google's [Health AI Developer Foundations (HAI-DEF)](https://goo.gle/hai-def) collection:
121
+
122
+ | Role | Model | What it does |
123
+ |------|-------|-------------|
124
+ | Speech recognition | [`google/medasr`](https://huggingface.co/google/medasr) | Converts consultation audio to text with medical vocabulary optimisation |
125
+ | EHR retrieval | [`google/medgemma-1.5-4b-it`](https://huggingface.co/google/medgemma-1.5-4b-it) | Queries FHIR records and synthesises structured patient context |
126
+ | Document generation | [`google/medgemma-27b-text-it`](https://huggingface.co/google/medgemma-27b-text-it) | Generates NHS clinic letters from transcript + EHR context |
127
+
128
+ Additionally, a QLoRA fine-tuned adapter is published at [`yashvshetty/clarke-medgemma-27b-lora`](https://huggingface.co/yashvshetty/clarke-medgemma-27b-lora) (173.4 MB, LoRA rank 16, trained on 5 NHS clinic letter examples).
129
+
130
+ ---
131
+
132
+ ## Evaluation Summary
133
+
134
+ Full methodology, per-patient results, error taxonomy, and limitations are documented in the [evaluation report](evaluation/EVALUATION.md). Headline findings:
135
+
136
+ **MedASR (Word Error Rate: 13.28%)** Most clinical terms transcribed correctly. Errors concentrated on patient names (resolved from EHR, not transcript) and rare medical terms. Downstream agents correct most transcription errors using authoritative EHR data.
137
+
138
+ **EHR Agent (100% recall, 98.6% precision)** Every allergy, medication, lab result, and diagnosis was retrieved across all five patients. One borderline hallucination occurred (a clinically correct trend annotation). The deterministic query architecture guarantees no stored fact is missed.
139
+
140
+ **Document Generation (BLEU-1 0.71 after QLoRA)** The base model scored BLEU-1 0.54. After QLoRA fine-tuning on 5 gold-standard NHS letters (15 minutes of training, single A100), BLEU-1 rose to 0.71 (+31%). All generated letters correctly captured diagnoses, medications, lab results, and management plans.
141
+
142
+ ---
143
+
144
+ ## QLoRA Fine-Tuning
145
+
146
+ Clarke includes a QLoRA fine-tuning pipeline for adapting MedGemma 27B to NHS letter conventions.
147
+
148
+ | Parameter | Value |
149
+ |-----------|-------|
150
+ | Method | QLoRA (4-bit base + LoRA rank 16, alpha 32) |
151
+ | Target modules | q_proj, k_proj, v_proj, o_proj |
152
+ | Training data | 5 gold-standard NHS clinic letters |
153
+ | Training time | ~15 minutes on A100 40 GB (Google Colab) |
154
+ | Framework | Unsloth |
155
+ | Adapter size | 173.4 MB |
156
+ | Result | BLEU-1 improved from 0.54 to 0.71 (+31%) |
157
+
158
+ Training scripts are in [`finetuning/`](finetuning/). The adapter is published at [`yashvshetty/clarke-medgemma-27b-lora`](https://huggingface.co/yashvshetty/clarke-medgemma-27b-lora).
159
+
160
+ ---
161
+
162
+ ## Quick Start
163
+
164
+ ### Run in mock mode (no GPU required)
165
+
166
+ ```bash
167
+ git clone https://github.com/yvs-tinker/clarke.git
168
+ cd clarke
169
+ python -m venv .venv
170
+ source .venv/bin/activate
171
+ pip install -r requirements.txt
172
+
173
+ USE_MOCK_FHIR=true MEDASR_MODEL_ID=mock MEDGEMMA_4B_MODEL_ID=mock MEDGEMMA_27B_MODEL_ID=mock python3 app.py
174
+ ```
175
+
176
+ Open [http://localhost:7860](http://localhost:7860).
177
+
178
+ ### Run with real models (requires A100 GPU + HuggingFace access)
179
+
180
+ Create a `.env` file:
181
+
182
+ ```
183
+ MEDASR_MODEL_ID=google/medasr
184
+ MEDGEMMA_4B_MODEL_ID=google/medgemma-1.5-4b-it
185
+ MEDGEMMA_27B_MODEL_ID=google/medgemma-27b-text-it
186
+ HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx
187
+ FHIR_BASE_URL=http://localhost:8080/fhir
188
+ APP_HOST=0.0.0.0
189
+ APP_PORT=7860
190
+ ```
191
+
192
+ Then launch with `python3 app.py`. Requires NVIDIA A100 80 GB for the full three-model pipeline.
193
+
194
+ ---
195
+
196
+ ## Repository Structure
197
+
198
+ ```
199
+ clarke/
200
+ ├── app.py # Application entry point (FastAPI + Gradio)
201
+ ├── backend/
202
+ │ ├── api.py # FastAPI REST endpoints
203
+ │ ├── orchestrator.py # Pipeline coordinator
204
+ │ ├── config.py # Environment and settings
205
+ │ ├── schemas.py # Pydantic data models
206
+ │ ├── models/
207
+ │ │ ├── medasr.py # MedASR transcription wrapper
208
+ │ │ ├── ehr_agent.py # MedGemma 4B EHR retrieval agent
209
+ │ │ └── doc_generator.py # MedGemma 27B letter generation agent
210
+ │ ├── fhir/
211
+ │ │ ├── client.py # FHIR server client
212
+ │ │ ├── queries.py # FHIR resource queries
213
+ │ │ ├── tools.py # Agent tool definitions
214
+ │ │ └── mock_api.py # Mock FHIR server for development
215
+ │ └── prompts/ # Jinja2 prompt templates
216
+ ├── frontend/
217
+ │ ├── ui.py # Gradio interface builder
218
+ │ ├── components.py # UI screen components
219
+ │ ├── state.py # Session state management
220
+ │ └── assets/ # CSS, logo, static files
221
+ ├── data/ # Demo audio, transcripts, synthetic FHIR bundles
222
+ ├── evaluation/ # Evaluation scripts and report
223
+ │ ├── EVALUATION.md # Full evaluation report
224
+ │ ├── eval_medasr.py # Word Error Rate evaluation
225
+ │ ├── eval_ehr_agent.py # EHR fact recall evaluation
226
+ │ ├── eval_doc_gen.py # BLEU/ROUGE-L evaluation
227
+ │ └── gold_standards/ # Reference letters for scoring
228
+ ├── finetuning/ # LoRA training scripts and adapter
229
+ └── tests/ # Unit, integration, and end-to-end tests
230
+ ```
231
+
232
+ ---
233
+
234
+ ## Development
235
+
236
+ Clarke was built by a solo 4th-year medical student over the competition period. Development used Claude (Anthropic) for architectural design, evaluation methodology, and technical problem-solving, and Codex (GitHub) for code implementation via pull requests.
237
+
238
+ Key technical decisions documented in the [evaluation report](evaluation/EVALUATION.md):
239
+ - **Deterministic EHR retrieval over agentic tool-calling** after prototyping showed MedGemma 4B's agentic queries were unreliable.
240
+ - **Full bfloat16 precision for inference** after discovering 4-bit quantisation breaks weight tying in MedGemma 27B.
241
+ - **Multi-agent error correction** where each pipeline stage compensates for upstream errors.
242
+
243
+ ---
244
+
245
+ ## Licence
246
+
247
+ - **Code:** [Apache 2.0](LICENSE)
248
+ - **Models:** Subject to [HAI-DEF Terms of Use](https://developers.google.com/health-ai-developer-foundations/terms) and HuggingFace gating requirements.
249
+
250
+ ---
251
+
252
+ ## Acknowledgements
253
+
254
+ - [MedGemma Impact Challenge](https://www.kaggle.com/competitions/medgemma-impact-challenge) by Google Health AI
255
+ - [HAI-DEF](https://goo.gle/hai-def) model releases
256
+ - [Synthea](https://synthetichealth.github.io/synthea/) for synthetic patient data
257
+ - [HAPI FHIR](https://hapifhir.io/) server ecosystem
app.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Application entry point mounting Clarke Gradio UI within FastAPI."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import uvicorn
6
+ from fastapi import FastAPI
7
+ import gradio as gr
8
+
9
+ from backend.api import app as fast_api
10
+ from backend.config import get_settings
11
+ from frontend.ui import build_ui
12
+
13
+
14
+ def create_app() -> FastAPI:
15
+ """Create the FastAPI application with mounted Gradio interface.
16
+
17
+ Args:
18
+ None: Uses existing FastAPI API app and Gradio UI builder.
19
+
20
+ Returns:
21
+ FastAPI: Unified ASGI app serving API and web UI.
22
+ """
23
+
24
+ demo = build_ui()
25
+ return gr.mount_gradio_app(fast_api, demo, path="/")
26
+
27
+
28
+ app = create_app()
29
+
30
+
31
+ if __name__ == "__main__":
32
+ settings = get_settings()
33
+ uvicorn.run(app, host=settings.APP_HOST, port=settings.APP_PORT)
backend/.DS_Store ADDED
Binary file (10.2 kB). View file
 
backend/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Backend package for Clarke services."""
backend/api.py ADDED
@@ -0,0 +1,490 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """FastAPI API surface for Clarke consultation and health endpoints."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import json
6
+ from datetime import datetime, timezone
7
+ from pathlib import Path
8
+ from shutil import copyfileobj
9
+ from typing import Any
10
+
11
+ from fastapi import BackgroundTasks, Body, FastAPI, File, Form, HTTPException, UploadFile
12
+
13
+ from backend.audio import convert_to_wav_16k, validate_audio
14
+ from backend.config import get_settings
15
+ from backend.errors import AudioError, ModelExecutionError, get_component_logger
16
+ from backend.orchestrator import PipelineOrchestrator
17
+ from backend.schemas import ConsultationStatus, Patient
18
+
19
+ app = FastAPI(title="Clarke API", version="0.1.0")
20
+ settings = get_settings()
21
+ logger = get_component_logger("api")
22
+ orchestrator = PipelineOrchestrator()
23
+
24
+
25
+ def _iso_timestamp() -> str:
26
+ """Return UTC timestamp in ISO 8601 format with Z suffix.
27
+
28
+ Args:
29
+ None: No input parameters.
30
+
31
+ Returns:
32
+ str: UTC timestamp string.
33
+ """
34
+
35
+ return datetime.now(tz=timezone.utc).replace(microsecond=0).isoformat().replace("+00:00", "Z")
36
+
37
+
38
+ def _load_clinic_list_payload() -> dict[str, Any]:
39
+ """Load clinic list fixture JSON from disk.
40
+
41
+ Args:
42
+ None: Reads static clinic list fixture file.
43
+
44
+ Returns:
45
+ dict[str, Any]: Parsed clinic list JSON payload.
46
+ """
47
+
48
+ clinic_path = Path("data/clinic_list.json")
49
+ if not clinic_path.exists():
50
+ raise HTTPException(status_code=500, detail="Clinic list file is missing")
51
+ return json.loads(clinic_path.read_text(encoding="utf-8"))
52
+
53
+
54
+ def _load_patient_resource(patient_id: str) -> dict[str, Any]:
55
+ """Load patient resource for a patient id from local FHIR bundle fixture.
56
+
57
+ Args:
58
+ patient_id (str): Patient identifier.
59
+
60
+ Returns:
61
+ dict[str, Any]: FHIR Patient resource from bundle fixture.
62
+ """
63
+
64
+ bundle_path = Path("data/fhir_bundles") / f"{patient_id}.json"
65
+ if not bundle_path.exists():
66
+ return {}
67
+ bundle = json.loads(bundle_path.read_text(encoding="utf-8"))
68
+ for entry in bundle.get("entry", []):
69
+ resource = entry.get("resource", {})
70
+ if resource.get("resourceType") == "Patient":
71
+ return resource
72
+ return {}
73
+
74
+
75
+ def _patient_from_records(clinic_row: dict[str, Any], patient_resource: dict[str, Any]) -> Patient:
76
+ """Build a Patient schema object from clinic list and FHIR records.
77
+
78
+ Args:
79
+ clinic_row (dict[str, Any]): Patient row from `clinic_list.json`.
80
+ patient_resource (dict[str, Any]): FHIR Patient resource dictionary.
81
+
82
+ Returns:
83
+ Patient: Normalised patient object for API responses.
84
+ """
85
+
86
+ nhs_number = ""
87
+ for identifier in patient_resource.get("identifier", []):
88
+ value = str(identifier.get("value", "")).strip()
89
+ if value:
90
+ nhs_number = value
91
+ break
92
+
93
+ date_of_birth = str(patient_resource.get("birthDate", ""))
94
+ if date_of_birth and "-" in date_of_birth:
95
+ yyyy, mm, dd = date_of_birth.split("-")
96
+ date_of_birth = f"{dd}/{mm}/{yyyy}"
97
+
98
+ return Patient(
99
+ id=str(clinic_row.get("id", "")),
100
+ nhs_number=nhs_number,
101
+ name=str(clinic_row.get("name", "")),
102
+ date_of_birth=date_of_birth,
103
+ age=int(clinic_row.get("age", 0)),
104
+ sex=str(clinic_row.get("sex", "")),
105
+ appointment_time=str(clinic_row.get("time", "")),
106
+ summary=str(clinic_row.get("summary", "")),
107
+ )
108
+
109
+
110
+ def _get_patient(patient_id: str) -> Patient:
111
+ """Resolve a patient from local fixture data.
112
+
113
+ Args:
114
+ patient_id (str): Patient identifier.
115
+
116
+ Returns:
117
+ Patient: Patient record for the requested id.
118
+ """
119
+
120
+ payload = _load_clinic_list_payload()
121
+ for row in payload.get("patients", []):
122
+ if str(row.get("id")) == patient_id:
123
+ resource = _load_patient_resource(patient_id)
124
+ return _patient_from_records(row, resource)
125
+ raise HTTPException(status_code=404, detail=f"Patient not found: {patient_id}")
126
+
127
+
128
+ def _health_fhir_status() -> dict[str, Any]:
129
+ """Compute lightweight FHIR service health metadata for health endpoint.
130
+
131
+ Args:
132
+ None: Reads settings and local data fixtures.
133
+
134
+ Returns:
135
+ dict[str, Any]: FHIR connectivity status and available patient count.
136
+ """
137
+
138
+ clinic_list_path = Path("data/clinic_list.json")
139
+ if settings.USE_MOCK_FHIR and clinic_list_path.exists():
140
+ payload = json.loads(clinic_list_path.read_text(encoding="utf-8"))
141
+ patients = payload.get("patients", [])
142
+ return {"status": "connected", "patient_count": len(patients)}
143
+
144
+ return {"status": "unknown", "patient_count": 0}
145
+
146
+
147
+ def _save_upload_temp(upload: UploadFile, destination: Path) -> Path:
148
+ """Persist an uploaded file to disk for subsequent conversion/validation.
149
+
150
+ Args:
151
+ upload (UploadFile): Uploaded file object from multipart payload.
152
+ destination (Path): Output path for storing file bytes.
153
+
154
+ Returns:
155
+ Path: Saved destination path.
156
+ """
157
+
158
+ destination.parent.mkdir(parents=True, exist_ok=True)
159
+ with destination.open("wb") as output_stream:
160
+ copyfileobj(upload.file, output_stream)
161
+ return destination
162
+
163
+
164
+ @app.get("/api/v1/health")
165
+ def get_health() -> dict[str, Any]:
166
+ """Return current service, model, and GPU health state.
167
+
168
+ Args:
169
+ None: Uses process-level singletons for status.
170
+
171
+ Returns:
172
+ dict[str, Any]: Health response payload.
173
+ """
174
+
175
+ gpu_info = orchestrator._medasr_model.model_manager.check_gpu()
176
+ models = {
177
+ "medasr": {
178
+ "loaded": orchestrator._medasr_model.model_manager.get_model("medasr") is not None,
179
+ "device": "cuda:0" if gpu_info["vram_total_bytes"] > 0 else "cpu",
180
+ },
181
+ "medgemma_4b": {"loaded": False, "device": "n/a", "quantised": "4bit"},
182
+ "medgemma_27b": {"loaded": False, "device": "n/a", "quantised": "4bit"},
183
+ }
184
+
185
+ return {
186
+ "status": "healthy",
187
+ "models": models,
188
+ "fhir": _health_fhir_status(),
189
+ "gpu": {
190
+ "name": gpu_info["gpu_name"],
191
+ "vram_used_gb": round(float(gpu_info["vram_used_bytes"]) / 1_000_000_000, 2),
192
+ "vram_total_gb": round(float(gpu_info["vram_total_bytes"]) / 1_000_000_000, 2),
193
+ },
194
+ "timestamp": _iso_timestamp(),
195
+ }
196
+
197
+
198
+ @app.get("/api/v1/patients")
199
+ def get_patients() -> dict[str, list[dict[str, Any]]]:
200
+ """Return clinic patient list with normalised schema fields.
201
+
202
+ Args:
203
+ None: Reads local clinic list and FHIR bundle fixtures.
204
+
205
+ Returns:
206
+ dict[str, list[dict[str, Any]]]: Patient list payload.
207
+ """
208
+
209
+ payload = _load_clinic_list_payload()
210
+ patients: list[dict[str, Any]] = []
211
+ for row in payload.get("patients", []):
212
+ patient_id = str(row.get("id", ""))
213
+ if not patient_id:
214
+ continue
215
+ patient_model = _patient_from_records(row, _load_patient_resource(patient_id))
216
+ patients.append(patient_model.model_dump())
217
+ return {"patients": patients}
218
+
219
+
220
+ @app.get("/api/v1/patients/{patient_id}")
221
+ def get_patient(patient_id: str) -> dict[str, Any]:
222
+ """Return a single patient by id.
223
+
224
+ Args:
225
+ patient_id (str): Patient identifier.
226
+
227
+ Returns:
228
+ dict[str, Any]: Patient payload.
229
+ """
230
+
231
+ patient = _get_patient(patient_id)
232
+ return patient.model_dump()
233
+
234
+
235
+ @app.post("/api/v1/patients/{patient_id}/context")
236
+ def generate_patient_context(patient_id: str) -> dict[str, Any]:
237
+ """Generate patient context via EHR agent and return context JSON.
238
+
239
+ Args:
240
+ patient_id (str): Patient identifier.
241
+
242
+ Returns:
243
+ dict[str, Any]: Structured patient context payload.
244
+ """
245
+
246
+ _get_patient(patient_id)
247
+ context = orchestrator._ehr_agent.get_patient_context(patient_id)
248
+ return context.model_dump(mode="json")
249
+
250
+
251
+ @app.post("/api/v1/consultations/start", status_code=201)
252
+ def start_consultation(payload: dict[str, str], background_tasks: BackgroundTasks) -> dict[str, Any]:
253
+ """Start a consultation session and trigger context prefetch in background.
254
+
255
+ Args:
256
+ payload (dict[str, str]): Request body containing patient_id.
257
+ background_tasks (BackgroundTasks): FastAPI background task runner.
258
+
259
+ Returns:
260
+ dict[str, Any]: Consultation identifier and initial state.
261
+ """
262
+
263
+ patient_id = payload.get("patient_id", "")
264
+ if not patient_id:
265
+ raise HTTPException(status_code=400, detail="patient_id is required")
266
+ patient = _get_patient(patient_id)
267
+
268
+ consultation = orchestrator.start_consultation(patient)
269
+ background_tasks.add_task(orchestrator.prefetch_context, consultation.id)
270
+
271
+ return {
272
+ "consultation_id": consultation.id,
273
+ "patient_id": patient_id,
274
+ "status": ConsultationStatus.RECORDING.value,
275
+ "started_at": consultation.started_at,
276
+ }
277
+
278
+
279
+ @app.post("/api/v1/consultations/{consultation_id}/audio")
280
+ def upload_audio(
281
+ consultation_id: str,
282
+ audio_file: UploadFile = File(...),
283
+ is_final: bool = Form(...),
284
+ ) -> dict[str, Any]:
285
+ """Accept consultation audio, convert to MedASR format, and store metadata.
286
+
287
+ Args:
288
+ consultation_id (str): Consultation identifier.
289
+ audio_file (UploadFile): Uploaded WAV/WebM audio file.
290
+ is_final (bool): Whether upload is the final complete recording.
291
+
292
+ Returns:
293
+ dict[str, Any]: Upload acknowledgement and validated duration seconds.
294
+ """
295
+
296
+ try:
297
+ consultation = orchestrator.get_consultation(consultation_id)
298
+ except KeyError as exc:
299
+ raise HTTPException(status_code=404, detail=str(exc)) from exc
300
+
301
+ extension = Path(audio_file.filename or "").suffix.lower()
302
+ if extension not in {".wav", ".webm"}:
303
+ raise HTTPException(status_code=400, detail="Only WAV or WebM audio uploads are supported")
304
+
305
+ uploads_root = Path("data/uploads") / consultation_id
306
+ raw_path = uploads_root / f"raw{extension}"
307
+ original_stem = Path(audio_file.filename or "audio").stem or "audio"
308
+ wav_path = uploads_root / f"{original_stem}_16k.wav"
309
+
310
+ try:
311
+ _save_upload_temp(audio_file, raw_path)
312
+ processed_path = raw_path if extension == ".wav" else Path(convert_to_wav_16k(str(raw_path), str(wav_path)))
313
+ if extension == ".wav":
314
+ processed_path = Path(convert_to_wav_16k(str(raw_path), str(wav_path)))
315
+
316
+ audio_metadata = validate_audio(str(processed_path))
317
+ except AudioError as exc:
318
+ logger.error("Audio processing failed", consultation_id=consultation_id, error=str(exc))
319
+ raise HTTPException(status_code=400, detail=str(exc)) from exc
320
+
321
+ consultation.audio_file_path = str(processed_path)
322
+
323
+ logger.info(
324
+ "Stored consultation audio",
325
+ consultation_id=consultation_id,
326
+ duration_s=audio_metadata["duration_s"],
327
+ is_final=is_final,
328
+ )
329
+
330
+ return {
331
+ "consultation_id": consultation_id,
332
+ "audio_received": True,
333
+ "duration_s": audio_metadata["duration_s"],
334
+ }
335
+
336
+
337
+ @app.post("/api/v1/consultations/{consultation_id}/end", status_code=202)
338
+ def end_consultation(consultation_id: str) -> dict[str, Any]:
339
+ """End a consultation and execute full processing pipeline.
340
+
341
+ Args:
342
+ consultation_id (str): Consultation identifier.
343
+
344
+ Returns:
345
+ dict[str, Any]: Pipeline kickoff and status metadata.
346
+ """
347
+
348
+ try:
349
+ consultation = orchestrator.end_consultation(consultation_id)
350
+ progress = orchestrator.get_progress(consultation_id)
351
+ except KeyError as exc:
352
+ raise HTTPException(status_code=404, detail=str(exc)) from exc
353
+ except TimeoutError as exc:
354
+ raise HTTPException(status_code=504, detail={"error": "timeout", "message": str(exc)}) from exc
355
+ except AudioError as exc:
356
+ raise HTTPException(status_code=400, detail={"error": "audio_error", "message": str(exc)}) from exc
357
+ except ModelExecutionError as exc:
358
+ error_type = "audio_error" if "transcribed" in str(exc).lower() else "model_error"
359
+ status_code = 400 if error_type == "audio_error" else 500
360
+ raise HTTPException(status_code=status_code, detail={"error": error_type, "message": str(exc)}) from exc
361
+
362
+ return {
363
+ "consultation_id": consultation.id,
364
+ "status": consultation.status.value,
365
+ "pipeline_stage": progress.stage.value,
366
+ "message": "Pipeline completed. Document is ready for review.",
367
+ }
368
+
369
+
370
+ @app.get("/api/v1/consultations/{consultation_id}/transcript")
371
+ def get_transcript(consultation_id: str) -> dict[str, Any]:
372
+ """Return transcript for a consultation when available.
373
+
374
+ Args:
375
+ consultation_id (str): Consultation identifier.
376
+
377
+ Returns:
378
+ dict[str, Any]: Transcript payload.
379
+ """
380
+
381
+ try:
382
+ consultation = orchestrator.get_consultation(consultation_id)
383
+ except KeyError as exc:
384
+ raise HTTPException(status_code=404, detail=str(exc)) from exc
385
+
386
+ if consultation.transcript is None:
387
+ raise HTTPException(status_code=404, detail="Transcript not generated yet")
388
+ return consultation.transcript.model_dump(mode="json")
389
+
390
+
391
+ @app.get("/api/v1/consultations/{consultation_id}/document")
392
+ def get_document(consultation_id: str) -> dict[str, Any]:
393
+ """Return generated document placeholder for a consultation.
394
+
395
+ Args:
396
+ consultation_id (str): Consultation identifier.
397
+
398
+ Returns:
399
+ dict[str, Any]: Consultation document object or null.
400
+ """
401
+
402
+ try:
403
+ consultation = orchestrator.get_consultation(consultation_id)
404
+ except KeyError as exc:
405
+ raise HTTPException(status_code=404, detail=str(exc)) from exc
406
+
407
+ return {
408
+ "consultation_id": consultation_id,
409
+ "document": consultation.document.model_dump(mode="json") if consultation.document else None,
410
+ }
411
+
412
+
413
+ @app.get("/api/v1/consultations/{consultation_id}/progress")
414
+ def get_progress(consultation_id: str) -> dict[str, Any]:
415
+ """Return latest pipeline progress object for a consultation.
416
+
417
+ Args:
418
+ consultation_id (str): Consultation identifier.
419
+
420
+ Returns:
421
+ dict[str, Any]: Pipeline progress payload.
422
+ """
423
+
424
+ try:
425
+ progress = orchestrator.get_progress(consultation_id)
426
+ except KeyError as exc:
427
+ raise HTTPException(status_code=404, detail=str(exc)) from exc
428
+ return progress.model_dump(mode="json")
429
+
430
+
431
+ @app.post("/api/v1/consultations/{consultation_id}/document/sign-off")
432
+ def sign_off_document(consultation_id: str, payload: dict[str, Any] | None = Body(default=None)) -> dict[str, Any]:
433
+ """Sign off a generated document and set consultation status to signed off.
434
+
435
+ Args:
436
+ consultation_id (str): Consultation identifier.
437
+
438
+ Returns:
439
+ dict[str, Any]: Signed-off document payload with updated status.
440
+ """
441
+
442
+ sections = (payload or {}).get("sections", [])
443
+
444
+ try:
445
+ if sections:
446
+ orchestrator.update_document_sections(consultation_id, sections)
447
+ document = orchestrator.sign_off_document(consultation_id)
448
+ except KeyError as exc:
449
+ raise HTTPException(status_code=404, detail=str(exc)) from exc
450
+ except ModelExecutionError as exc:
451
+ raise HTTPException(status_code=400, detail=str(exc)) from exc
452
+
453
+ return {
454
+ "consultation_id": consultation_id,
455
+ "status": ConsultationStatus.SIGNED_OFF.value,
456
+ "document": document.model_dump(mode="json"),
457
+ }
458
+
459
+
460
+ @app.post("/api/v1/consultations/{consultation_id}/document/regenerate-section")
461
+ def regenerate_document_section(
462
+ consultation_id: str,
463
+ payload: dict[str, str] = Body(...),
464
+ ) -> dict[str, Any]:
465
+ """Regenerate one document section while keeping other sections unchanged.
466
+
467
+ Args:
468
+ consultation_id (str): Consultation identifier.
469
+ payload (dict[str, str]): Request body containing `section_heading`.
470
+
471
+ Returns:
472
+ dict[str, Any]: Updated document payload.
473
+ """
474
+
475
+ section_heading = payload.get("section_heading", "").strip()
476
+ if not section_heading:
477
+ raise HTTPException(status_code=400, detail="section_heading is required")
478
+
479
+ try:
480
+ document = orchestrator.regenerate_document_section(consultation_id, section_heading)
481
+ except KeyError as exc:
482
+ raise HTTPException(status_code=404, detail=str(exc)) from exc
483
+ except ModelExecutionError as exc:
484
+ raise HTTPException(status_code=400, detail=str(exc)) from exc
485
+
486
+ return {
487
+ "consultation_id": consultation_id,
488
+ "status": ConsultationStatus.REVIEW.value,
489
+ "document": document.model_dump(mode="json"),
490
+ }
backend/audio.py ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Audio conversion and validation utilities for MedASR-ready WAV files."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from pathlib import Path
6
+
7
+ import librosa
8
+ from pydub import AudioSegment
9
+
10
+ from backend.errors import AudioError
11
+
12
+
13
+ MIN_AUDIO_DURATION_S = 5.0
14
+ MAX_AUDIO_DURATION_S = 1800.0
15
+ EXPECTED_SAMPLE_RATE = 16000
16
+ EXPECTED_CHANNELS = 1
17
+
18
+
19
+ def convert_to_wav_16k(input_path: str, output_path: str) -> str:
20
+ """Convert supported audio input to 16kHz mono PCM WAV.
21
+
22
+ Args:
23
+ input_path (str): Source audio path (e.g. webm, mp3, wav).
24
+ output_path (str): Destination WAV path.
25
+
26
+ Returns:
27
+ str: Output path for converted 16kHz mono WAV file.
28
+ """
29
+ source_path = Path(input_path)
30
+ destination_path = Path(output_path)
31
+
32
+ if not source_path.exists():
33
+ raise AudioError(f"Audio input not found: {source_path}")
34
+
35
+ try:
36
+ audio = AudioSegment.from_file(source_path)
37
+ except Exception as exc: # pragma: no cover - pydub/ffmpeg message is external
38
+ raise AudioError(f"Failed to decode audio file {source_path}: {exc}") from exc
39
+
40
+ destination_path.parent.mkdir(parents=True, exist_ok=True)
41
+ processed = audio.set_frame_rate(EXPECTED_SAMPLE_RATE).set_channels(EXPECTED_CHANNELS).set_sample_width(2)
42
+
43
+ try:
44
+ processed.export(destination_path, format="wav")
45
+ except Exception as exc: # pragma: no cover - pydub/ffmpeg message is external
46
+ raise AudioError(f"Failed to export WAV audio {destination_path}: {exc}") from exc
47
+
48
+ return str(destination_path)
49
+
50
+
51
+ def validate_audio(file_path: str) -> dict[str, float | int]:
52
+ """Validate an audio file meets pipeline constraints.
53
+
54
+ Args:
55
+ file_path (str): Path to WAV file to inspect.
56
+
57
+ Returns:
58
+ dict[str, float | int]: duration_s, sample_rate, channels values.
59
+ """
60
+ path = Path(file_path)
61
+ if not path.exists():
62
+ raise AudioError(f"Audio file not found: {path}")
63
+
64
+ try:
65
+ waveform, sample_rate = librosa.load(path, sr=None, mono=False)
66
+ except Exception as exc:
67
+ raise AudioError(f"Unable to load audio for validation: {path}: {exc}") from exc
68
+
69
+ channels = int(waveform.shape[0]) if getattr(waveform, "ndim", 1) > 1 else 1
70
+ duration_s = float(librosa.get_duration(y=waveform, sr=sample_rate))
71
+
72
+ if sample_rate != EXPECTED_SAMPLE_RATE:
73
+ raise AudioError(f"Invalid sample rate {sample_rate}; expected {EXPECTED_SAMPLE_RATE}")
74
+ if channels != EXPECTED_CHANNELS:
75
+ raise AudioError(f"Invalid channel count {channels}; expected {EXPECTED_CHANNELS}")
76
+ if duration_s <= MIN_AUDIO_DURATION_S:
77
+ raise AudioError(f"Audio duration {duration_s:.2f}s must be > {MIN_AUDIO_DURATION_S}s")
78
+ if duration_s >= MAX_AUDIO_DURATION_S:
79
+ raise AudioError(f"Audio duration {duration_s:.2f}s must be < {MAX_AUDIO_DURATION_S}s")
80
+
81
+ return {
82
+ "duration_s": duration_s,
83
+ "sample_rate": int(sample_rate),
84
+ "channels": channels,
85
+ }
backend/config.py ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Centralised application configuration loaded from environment variables."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from functools import lru_cache
6
+
7
+ from dotenv import load_dotenv
8
+ from pydantic import ConfigDict
9
+ from pydantic_settings import BaseSettings
10
+
11
+ load_dotenv()
12
+
13
+
14
+ class Settings(BaseSettings):
15
+ """Application settings schema loaded from environment variables.
16
+
17
+ Params:
18
+ None: Values are read from process environment and optional `.env` file.
19
+ Returns:
20
+ Settings: Parsed and validated settings object.
21
+ """
22
+
23
+ model_config = ConfigDict(env_file=".env", env_file_encoding="utf-8", extra="ignore")
24
+
25
+ MEDASR_MODEL_ID: str = "google/medasr"
26
+ MEDGEMMA_4B_MODEL_ID: str = "google/medgemma-1.5-4b-it"
27
+ MEDGEMMA_27B_MODEL_ID: str = "google/medgemma-27b-text-it"
28
+ HF_TOKEN: str = ""
29
+ QUANTIZE_4BIT: bool = True
30
+ USE_FLASH_ATTENTION: bool = True
31
+
32
+ FHIR_SERVER_URL: str = "http://localhost:8080/fhir"
33
+ USE_MOCK_FHIR: bool = True
34
+ FHIR_TIMEOUT_S: int = 10
35
+
36
+ APP_HOST: str = "0.0.0.0"
37
+ APP_PORT: int = 7860
38
+ LOG_LEVEL: str = "INFO"
39
+ MAX_AUDIO_DURATION_S: int = 1800
40
+ PIPELINE_TIMEOUT_S: int = 120
41
+ DOC_GEN_MAX_TOKENS: int = 2048
42
+ DOC_GEN_TEMPERATURE: float = 0.3
43
+
44
+ WANDB_API_KEY: str = ""
45
+ WANDB_PROJECT: str = "clarke-finetuning"
46
+ LORA_RANK: int = 16
47
+ LORA_ALPHA: int = 32
48
+ LORA_DROPOUT: float = 0.05
49
+ TRAINING_EPOCHS: int = 3
50
+ LEARNING_RATE: float = 2e-4
51
+ BATCH_SIZE: int = 2
52
+ GRAD_ACCUM_STEPS: int = 8
53
+ MAX_SEQ_LENGTH: int = 4096
54
+
55
+
56
+ @lru_cache(maxsize=1)
57
+ def get_settings() -> Settings:
58
+ """Return a cached settings instance for process-wide reuse.
59
+
60
+ Params:
61
+ None: Reads values from environment variables and `.env` if present.
62
+ Returns:
63
+ Settings: Cached configuration object.
64
+ """
65
+
66
+ return Settings()
backend/errors.py ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Error types and logging bootstrap for Clarke backend services."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from pathlib import Path
6
+
7
+ from loguru import logger
8
+
9
+
10
+ class ClarkeError(Exception):
11
+ """Base exception for Clarke backend failures.
12
+
13
+ Params:
14
+ message (str): Human-readable error message.
15
+ Returns:
16
+ None: Raises an exception instance.
17
+ """
18
+
19
+
20
+ class ConfigError(ClarkeError):
21
+ """Raised when configuration values are missing or invalid.
22
+
23
+ Params:
24
+ message (str): Human-readable error message.
25
+ Returns:
26
+ None: Raises an exception instance.
27
+ """
28
+
29
+
30
+ class FHIRClientError(ClarkeError):
31
+ """Raised for FHIR retrieval and parsing failures.
32
+
33
+ Params:
34
+ message (str): Human-readable error message.
35
+ Returns:
36
+ None: Raises an exception instance.
37
+ """
38
+
39
+
40
+ class ModelExecutionError(ClarkeError):
41
+ """Raised when model loading or inference fails.
42
+
43
+ Params:
44
+ message (str): Human-readable error message.
45
+ Returns:
46
+ None: Raises an exception instance.
47
+ """
48
+
49
+
50
+ class AudioError(ClarkeError):
51
+ """Raised when audio input fails conversion or validation checks.
52
+
53
+ Params:
54
+ message (str): Human-readable error message.
55
+ Returns:
56
+ None: Raises an exception instance.
57
+ """
58
+
59
+
60
+ def configure_logging(log_level: str = "DEBUG") -> None:
61
+ """Configure loguru sinks for console and rotating file logs.
62
+
63
+ Params:
64
+ log_level (str): Minimum enabled log severity level.
65
+ Returns:
66
+ None: Configures global logger side effects.
67
+ """
68
+
69
+ Path("logs").mkdir(parents=True, exist_ok=True)
70
+ logger.remove()
71
+ logger.add(
72
+ "logs/clarke_{time}.log",
73
+ format="{time:YYYY-MM-DD HH:mm:ss.SSS} | {level:<7} | {extra[component]:<15} | {message}",
74
+ rotation="50 MB",
75
+ retention="7 days",
76
+ level="DEBUG",
77
+ )
78
+ logger.add(
79
+ sink="stderr",
80
+ format="{time:YYYY-MM-DD HH:mm:ss.SSS} | {level:<7} | {extra[component]:<15} | {message}",
81
+ level=log_level,
82
+ )
83
+
84
+
85
+ def get_component_logger(component: str):
86
+ """Return a logger bound to a component name for structured logs.
87
+
88
+ Params:
89
+ component (str): Logical module/component identifier.
90
+ Returns:
91
+ loguru.Logger: Bound logger with `component` context.
92
+ """
93
+
94
+ return logger.bind(component=component)
backend/fhir/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """FHIR integration package for Clarke."""
backend/fhir/client.py ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Asynchronous FHIR REST client for retrieving patient-scoped resources."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Any
6
+
7
+ import httpx
8
+
9
+ from backend.config import get_settings
10
+ from backend.errors import FHIRClientError, get_component_logger
11
+
12
+ logger = get_component_logger("fhir_client")
13
+
14
+
15
+ class FHIRClient:
16
+ """HTTP client for FHIR query patterns used by Clarke orchestration.
17
+
18
+ Args:
19
+ fhir_server_url (str): Base URL of FHIR API including `/fhir` path.
20
+ timeout_s (int): Request timeout in seconds.
21
+
22
+ Returns:
23
+ None: Initialises a client instance.
24
+ """
25
+
26
+ def __init__(self, fhir_server_url: str, timeout_s: int) -> None:
27
+ self.fhir_server_url = fhir_server_url.rstrip("/")
28
+ self.timeout_s = timeout_s
29
+
30
+ async def _request_json(self, path: str, params: dict[str, Any] | None = None) -> dict[str, Any]:
31
+ """Execute a GET request with timeout, 404-empty behaviour, and one 5xx retry.
32
+
33
+ Args:
34
+ path (str): FHIR relative path beginning with `/`.
35
+ params (dict[str, Any] | None): Query parameters for the request.
36
+
37
+ Returns:
38
+ dict[str, Any]: Parsed JSON response, or empty dictionary for 404 responses.
39
+ """
40
+
41
+ url = f"{self.fhir_server_url}{path}"
42
+ attempts = 0
43
+ while attempts < 2:
44
+ attempts += 1
45
+ try:
46
+ async with httpx.AsyncClient(timeout=self.timeout_s) as client:
47
+ response = await client.get(url, params=params)
48
+ except httpx.TimeoutException as exc:
49
+ logger.error("FHIR request timed out", url=url, params=params, timeout_s=self.timeout_s)
50
+ raise FHIRClientError(
51
+ f"FHIR timeout for {url} with params={params} after {self.timeout_s}s"
52
+ ) from exc
53
+ except httpx.HTTPError as exc:
54
+ logger.error("FHIR transport error", url=url, params=params, error=str(exc))
55
+ raise FHIRClientError(f"FHIR transport error for {url}: {exc}") from exc
56
+
57
+ if response.status_code == 404:
58
+ return {}
59
+
60
+ if response.status_code >= 500 and attempts < 2:
61
+ logger.warning("FHIR server 5xx, retrying once", url=url, status_code=response.status_code)
62
+ continue
63
+
64
+ if response.status_code >= 400:
65
+ raise FHIRClientError(
66
+ f"FHIR request failed for {url} status={response.status_code} body={response.text}"
67
+ )
68
+
69
+ return response.json()
70
+
71
+ raise FHIRClientError(f"FHIR server error persisted after retry for {url}")
72
+
73
+ async def get_patient(self, patient_id: str) -> dict[str, Any]:
74
+ """Fetch a Patient resource by id.
75
+
76
+ Args:
77
+ patient_id (str): Patient identifier.
78
+
79
+ Returns:
80
+ dict[str, Any]: Patient JSON resource or empty dict when not found.
81
+ """
82
+
83
+ return await self._request_json(f"/Patient/{patient_id}")
84
+
85
+ async def search_patients(self, name: str, count: int = 10) -> dict[str, Any]:
86
+ """Search patients by name using FHIR Patient endpoint.
87
+
88
+ Args:
89
+ name (str): Name fragment to search.
90
+ count (int): Maximum number of records to request.
91
+
92
+ Returns:
93
+ dict[str, Any]: FHIR Bundle search response.
94
+ """
95
+
96
+ return await self._request_json("/Patient", params={"name": name, "_count": count})
97
+
98
+ async def get_conditions(self, patient_id: str, clinical_status: str = "active") -> dict[str, Any]:
99
+ """Fetch Condition resources for a patient.
100
+
101
+ Args:
102
+ patient_id (str): Patient identifier.
103
+ clinical_status (str): Desired clinical status value.
104
+
105
+ Returns:
106
+ dict[str, Any]: FHIR Bundle response.
107
+ """
108
+
109
+ return await self._request_json(
110
+ "/Condition", params={"patient": patient_id, "clinical-status": clinical_status}
111
+ )
112
+
113
+ async def get_medications(self, patient_id: str, status: str = "active") -> dict[str, Any]:
114
+ """Fetch MedicationRequest resources for a patient.
115
+
116
+ Args:
117
+ patient_id (str): Patient identifier.
118
+ status (str): Medication status filter.
119
+
120
+ Returns:
121
+ dict[str, Any]: FHIR Bundle response.
122
+ """
123
+
124
+ return await self._request_json("/MedicationRequest", params={"patient": patient_id, "status": status})
125
+
126
+ async def get_observations(
127
+ self,
128
+ patient_id: str,
129
+ category: str = "laboratory",
130
+ sort: str = "-date",
131
+ count: int = 20,
132
+ ) -> dict[str, Any]:
133
+ """Fetch Observation resources for a patient.
134
+
135
+ Args:
136
+ patient_id (str): Patient identifier.
137
+ category (str): Observation category code filter.
138
+ sort (str): Sort directive for date-like fields.
139
+ count (int): Maximum number of records.
140
+
141
+ Returns:
142
+ dict[str, Any]: FHIR Bundle response.
143
+ """
144
+
145
+ return await self._request_json(
146
+ "/Observation",
147
+ params={"patient": patient_id, "category": category, "_sort": sort, "_count": count},
148
+ )
149
+
150
+ async def get_allergies(self, patient_id: str) -> dict[str, Any]:
151
+ """Fetch AllergyIntolerance resources for a patient.
152
+
153
+ Args:
154
+ patient_id (str): Patient identifier.
155
+
156
+ Returns:
157
+ dict[str, Any]: FHIR Bundle response.
158
+ """
159
+
160
+ return await self._request_json("/AllergyIntolerance", params={"patient": patient_id})
161
+
162
+ async def get_diagnostic_reports(self, patient_id: str, sort: str = "-date", count: int = 5) -> dict[str, Any]:
163
+ """Fetch DiagnosticReport resources for a patient.
164
+
165
+ Args:
166
+ patient_id (str): Patient identifier.
167
+ sort (str): Sort directive for date-like fields.
168
+ count (int): Maximum number of records.
169
+
170
+ Returns:
171
+ dict[str, Any]: FHIR Bundle response.
172
+ """
173
+
174
+ return await self._request_json(
175
+ "/DiagnosticReport", params={"patient": patient_id, "_sort": sort, "_count": count}
176
+ )
177
+
178
+ async def get_recent_encounters(self, patient_id: str, sort: str = "-date", count: int = 3) -> dict[str, Any]:
179
+ """Fetch recent Encounter resources for a patient.
180
+
181
+ Args:
182
+ patient_id (str): Patient identifier.
183
+ sort (str): Sort directive for date-like fields.
184
+ count (int): Maximum number of records.
185
+
186
+ Returns:
187
+ dict[str, Any]: FHIR Bundle response.
188
+ """
189
+
190
+ return await self._request_json("/Encounter", params={"patient": patient_id, "_sort": sort, "_count": count})
191
+
192
+
193
+ settings = get_settings()
194
+ DEFAULT_FHIR_CLIENT = FHIRClient(fhir_server_url=settings.FHIR_SERVER_URL, timeout_s=settings.FHIR_TIMEOUT_S)
backend/fhir/mock_api.py ADDED
@@ -0,0 +1,412 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Mock FHIR API server for local development and fallback deployments."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import json
6
+ import os
7
+ from pathlib import Path
8
+ from typing import Any
9
+
10
+ import uvicorn
11
+ from fastapi import FastAPI, HTTPException, Query
12
+ from loguru import logger
13
+
14
+ FHIR_BASE_URL = "http://localhost:8080/fhir"
15
+ DEFAULT_BUNDLES_DIR = Path(__file__).resolve().parents[2] / "data" / "fhir_bundles"
16
+
17
+ app = FastAPI(title="Clarke Mock FHIR API", version="1.0.0")
18
+
19
+
20
+ class BundleIndex:
21
+ """In-memory index for patient bundles keyed by patient id and resource type."""
22
+
23
+ def __init__(self, bundles_dir: Path) -> None:
24
+ """Initialise and preload all FHIR bundles from disk.
25
+
26
+ Args:
27
+ bundles_dir (Path): Directory containing per-patient bundle JSON files.
28
+
29
+ Returns:
30
+ None: Instance state is populated in-place.
31
+ """
32
+ self.bundles_dir = bundles_dir
33
+ self.patient_resources: dict[str, dict[str, Any]] = {}
34
+ self.resources_by_type: dict[str, dict[str, list[dict[str, Any]]]] = {}
35
+ self._load_bundles()
36
+
37
+ def _load_bundles(self) -> None:
38
+ """Load all bundle files and build lookup indexes for API routes.
39
+
40
+ Args:
41
+ None: Uses instance `bundles_dir`.
42
+
43
+ Returns:
44
+ None: Internal index maps are rebuilt.
45
+ """
46
+ if not self.bundles_dir.exists():
47
+ logger.bind(component="mock_fhir_api").warning(
48
+ "Bundle directory does not exist", bundles_dir=str(self.bundles_dir)
49
+ )
50
+ return
51
+
52
+ for bundle_file in sorted(self.bundles_dir.glob("*.json")):
53
+ with bundle_file.open("r", encoding="utf-8") as file_handle:
54
+ bundle = json.load(file_handle)
55
+
56
+ for entry in bundle.get("entry", []):
57
+ resource = entry.get("resource", {})
58
+ resource_type = resource.get("resourceType")
59
+ if not resource_type:
60
+ continue
61
+
62
+ patient_id = self._extract_patient_id(resource)
63
+ if not patient_id and resource_type == "Patient":
64
+ patient_id = resource.get("id")
65
+
66
+ if not patient_id:
67
+ continue
68
+
69
+ if resource_type == "Patient":
70
+ self.patient_resources[patient_id] = resource
71
+
72
+ self.resources_by_type.setdefault(resource_type, {}).setdefault(patient_id, []).append(resource)
73
+
74
+ logger.bind(component="mock_fhir_api").info(
75
+ "Loaded mock FHIR bundles",
76
+ patients=len(self.patient_resources),
77
+ bundles_dir=str(self.bundles_dir),
78
+ )
79
+
80
+ @staticmethod
81
+ def _extract_patient_id(resource: dict[str, Any]) -> str | None:
82
+ """Extract patient identifier from common FHIR reference fields.
83
+
84
+ Args:
85
+ resource (dict[str, Any]): A FHIR resource object.
86
+
87
+ Returns:
88
+ str | None: Patient id (e.g. `pt-001`) when detectable, otherwise None.
89
+ """
90
+ reference_candidates = [
91
+ resource.get("subject", {}).get("reference"),
92
+ resource.get("patient", {}).get("reference"),
93
+ ]
94
+ for candidate in reference_candidates:
95
+ if candidate and isinstance(candidate, str) and candidate.startswith("Patient/"):
96
+ return candidate.split("/", maxsplit=1)[1]
97
+ return None
98
+
99
+ def get_patient(self, patient_id: str) -> dict[str, Any] | None:
100
+ """Return a single patient resource by id.
101
+
102
+ Args:
103
+ patient_id (str): Patient identifier.
104
+
105
+ Returns:
106
+ dict[str, Any] | None: Patient resource when present, else None.
107
+ """
108
+ return self.patient_resources.get(patient_id)
109
+
110
+ def get_resources(self, resource_type: str, patient_id: str) -> list[dict[str, Any]]:
111
+ """Return all indexed resources of a given type for a patient.
112
+
113
+ Args:
114
+ resource_type (str): FHIR resource type name.
115
+ patient_id (str): Patient identifier.
116
+
117
+ Returns:
118
+ list[dict[str, Any]]: Matching resources, empty list when none found.
119
+ """
120
+ return self.resources_by_type.get(resource_type, {}).get(patient_id, [])
121
+
122
+ def search_patients(self, name_term: str) -> list[dict[str, Any]]:
123
+ """Search patients by case-insensitive name matching across given/family/prefix.
124
+
125
+ Args:
126
+ name_term (str): Name fragment entered by caller.
127
+
128
+ Returns:
129
+ list[dict[str, Any]]: Matching patient resources.
130
+ """
131
+ lowered = name_term.lower().strip()
132
+ matches: list[dict[str, Any]] = []
133
+ for patient in self.patient_resources.values():
134
+ for name in patient.get("name", []):
135
+ tokens = []
136
+ tokens.extend(name.get("prefix", []))
137
+ tokens.extend(name.get("given", []))
138
+ if family := name.get("family"):
139
+ tokens.append(family)
140
+ if any(lowered in str(token).lower() for token in tokens):
141
+ matches.append(patient)
142
+ break
143
+ return matches
144
+
145
+
146
+ BUNDLE_INDEX = BundleIndex(Path(os.getenv("MOCK_FHIR_BUNDLES_DIR", str(DEFAULT_BUNDLES_DIR))))
147
+
148
+
149
+ def build_search_bundle(resource_type: str, resources: list[dict[str, Any]]) -> dict[str, Any]:
150
+ """Build a FHIR searchset bundle payload from resource rows.
151
+
152
+ Args:
153
+ resource_type (str): Resource type represented in `resources`.
154
+ resources (list[dict[str, Any]]): FHIR resources to include.
155
+
156
+ Returns:
157
+ dict[str, Any]: FHIR Bundle with `entry` and `total` fields.
158
+ """
159
+ entries = [
160
+ {
161
+ "fullUrl": f"{FHIR_BASE_URL}/{resource_type}/{resource.get('id', '')}",
162
+ "resource": resource,
163
+ }
164
+ for resource in resources
165
+ ]
166
+ return {
167
+ "resourceType": "Bundle",
168
+ "type": "searchset",
169
+ "total": len(resources),
170
+ "entry": entries,
171
+ }
172
+
173
+
174
+ def get_effective_sort_key(resource: dict[str, Any]) -> str:
175
+ """Resolve date-like sort key for FHIR resources supporting `_sort=-date`.
176
+
177
+ Args:
178
+ resource (dict[str, Any]): Resource candidate.
179
+
180
+ Returns:
181
+ str: Date-like string usable for lexicographic descending sort.
182
+ """
183
+ return (
184
+ resource.get("effectiveDateTime")
185
+ or resource.get("issued")
186
+ or resource.get("authoredOn")
187
+ or resource.get("onsetDateTime")
188
+ or resource.get("period", {}).get("start")
189
+ or ""
190
+ )
191
+
192
+
193
+ def get_patient_or_404(patient_id: str) -> dict[str, Any]:
194
+ """Resolve patient resource or raise a 404 HTTP exception.
195
+
196
+ Args:
197
+ patient_id (str): Patient identifier to lookup.
198
+
199
+ Returns:
200
+ dict[str, Any]: Existing patient resource.
201
+ """
202
+ patient_resource = BUNDLE_INDEX.get_patient(patient_id)
203
+ if not patient_resource:
204
+ raise HTTPException(status_code=404, detail=f"Patient not found: {patient_id}")
205
+ return patient_resource
206
+
207
+
208
+ @app.get("/fhir/Patient/{patient_id}")
209
+ async def read_patient(patient_id: str) -> dict[str, Any]:
210
+ """Return a Patient resource by id.
211
+
212
+ Args:
213
+ patient_id (str): Patient identifier path parameter.
214
+
215
+ Returns:
216
+ dict[str, Any]: Patient resource payload.
217
+ """
218
+ return get_patient_or_404(patient_id)
219
+
220
+
221
+ @app.get("/fhir/Patient")
222
+ async def search_patient(name: str = Query(default=""), _count: int = Query(default=10, ge=1)) -> dict[str, Any]:
223
+ """Search patients by name and return a FHIR searchset bundle.
224
+
225
+ Args:
226
+ name (str): Name fragment for case-insensitive matching.
227
+ _count (int): Maximum result size.
228
+
229
+ Returns:
230
+ dict[str, Any]: Search bundle of Patient resources.
231
+ """
232
+ resources = BUNDLE_INDEX.search_patients(name) if name else list(BUNDLE_INDEX.patient_resources.values())
233
+ return build_search_bundle("Patient", resources[:_count])
234
+
235
+
236
+ def list_patient_resources(resource_type: str, patient: str, limit: int, sort_desc_by_date: bool = False) -> dict[str, Any]:
237
+ """Fetch filtered patient resources and wrap them as a searchset bundle.
238
+
239
+ Args:
240
+ resource_type (str): FHIR resource type to return.
241
+ patient (str): Patient identifier.
242
+ limit (int): Maximum number of resources to include.
243
+ sort_desc_by_date (bool): Whether to sort descending by date-like fields.
244
+
245
+ Returns:
246
+ dict[str, Any]: Search bundle for the requested resource type.
247
+ """
248
+ get_patient_or_404(patient)
249
+ resources = list(BUNDLE_INDEX.get_resources(resource_type, patient))
250
+ if sort_desc_by_date:
251
+ resources.sort(key=get_effective_sort_key, reverse=True)
252
+ return build_search_bundle(resource_type, resources[:limit])
253
+
254
+
255
+ @app.get("/fhir/Condition")
256
+ async def list_conditions(
257
+ patient: str,
258
+ clinical_status: str | None = Query(default=None, alias="clinical-status"),
259
+ _count: int = Query(default=20, ge=1),
260
+ ) -> dict[str, Any]:
261
+ """List Condition resources for a patient with optional active-status filtering.
262
+
263
+ Args:
264
+ patient (str): Patient identifier query parameter.
265
+ clinical_status (str | None): Optional status filter (e.g. `active`).
266
+ _count (int): Maximum number of rows.
267
+
268
+ Returns:
269
+ dict[str, Any]: Condition search bundle.
270
+ """
271
+ bundle = list_patient_resources("Condition", patient, _count)
272
+ if clinical_status:
273
+ filtered_entries = [
274
+ item
275
+ for item in bundle["entry"]
276
+ if any(
277
+ coding.get("code") == clinical_status
278
+ for coding in item["resource"].get("clinicalStatus", {}).get("coding", [])
279
+ )
280
+ ]
281
+ bundle["entry"] = filtered_entries
282
+ bundle["total"] = len(filtered_entries)
283
+ return bundle
284
+
285
+
286
+ @app.get("/fhir/MedicationRequest")
287
+ async def list_medication_requests(
288
+ patient: str,
289
+ status: str | None = Query(default=None),
290
+ _count: int = Query(default=20, ge=1),
291
+ ) -> dict[str, Any]:
292
+ """List MedicationRequest resources for a patient with optional status filter.
293
+
294
+ Args:
295
+ patient (str): Patient identifier query parameter.
296
+ status (str | None): Optional medication status filter.
297
+ _count (int): Maximum number of rows.
298
+
299
+ Returns:
300
+ dict[str, Any]: MedicationRequest search bundle.
301
+ """
302
+ bundle = list_patient_resources("MedicationRequest", patient, _count, sort_desc_by_date=True)
303
+ if status:
304
+ filtered_entries = [item for item in bundle["entry"] if item["resource"].get("status") == status]
305
+ bundle["entry"] = filtered_entries
306
+ bundle["total"] = len(filtered_entries)
307
+ return bundle
308
+
309
+
310
+ @app.get("/fhir/Observation")
311
+ async def list_observations(
312
+ patient: str,
313
+ category: str | None = Query(default=None),
314
+ _sort: str | None = Query(default=None),
315
+ _count: int = Query(default=20, ge=1),
316
+ ) -> dict[str, Any]:
317
+ """List Observation resources for a patient with category/count/sort controls.
318
+
319
+ Args:
320
+ patient (str): Patient identifier query parameter.
321
+ category (str | None): Optional observation category code.
322
+ _sort (str | None): Optional sort string (`-date` expected).
323
+ _count (int): Maximum number of rows.
324
+
325
+ Returns:
326
+ dict[str, Any]: Observation search bundle.
327
+ """
328
+ sort_desc_by_date = _sort == "-date"
329
+ bundle = list_patient_resources("Observation", patient, _count, sort_desc_by_date=sort_desc_by_date)
330
+ if category:
331
+ filtered_entries = [
332
+ item
333
+ for item in bundle["entry"]
334
+ if any(
335
+ coding.get("code") == category
336
+ for block in item["resource"].get("category", [])
337
+ for coding in block.get("coding", [])
338
+ )
339
+ ]
340
+ bundle["entry"] = filtered_entries
341
+ bundle["total"] = len(filtered_entries)
342
+ return bundle
343
+
344
+
345
+ @app.get("/fhir/AllergyIntolerance")
346
+ async def list_allergies(patient: str, _count: int = Query(default=20, ge=1)) -> dict[str, Any]:
347
+ """List AllergyIntolerance resources for a patient.
348
+
349
+ Args:
350
+ patient (str): Patient identifier query parameter.
351
+ _count (int): Maximum number of rows.
352
+
353
+ Returns:
354
+ dict[str, Any]: AllergyIntolerance search bundle.
355
+ """
356
+ return list_patient_resources("AllergyIntolerance", patient, _count)
357
+
358
+
359
+ @app.get("/fhir/DiagnosticReport")
360
+ async def list_diagnostic_reports(
361
+ patient: str,
362
+ _sort: str | None = Query(default=None),
363
+ _count: int = Query(default=5, ge=1),
364
+ ) -> dict[str, Any]:
365
+ """List DiagnosticReport resources for a patient with optional date sorting.
366
+
367
+ Args:
368
+ patient (str): Patient identifier query parameter.
369
+ _sort (str | None): Optional sort string (`-date` expected).
370
+ _count (int): Maximum number of rows.
371
+
372
+ Returns:
373
+ dict[str, Any]: DiagnosticReport search bundle.
374
+ """
375
+ return list_patient_resources("DiagnosticReport", patient, _count, sort_desc_by_date=_sort == "-date")
376
+
377
+
378
+ @app.get("/fhir/Encounter")
379
+ async def list_encounters(
380
+ patient: str,
381
+ _sort: str | None = Query(default=None),
382
+ _count: int = Query(default=3, ge=1),
383
+ ) -> dict[str, Any]:
384
+ """List Encounter resources for a patient with optional date sorting.
385
+
386
+ Args:
387
+ patient (str): Patient identifier query parameter.
388
+ _sort (str | None): Optional sort string (`-date` expected).
389
+ _count (int): Maximum number of rows.
390
+
391
+ Returns:
392
+ dict[str, Any]: Encounter search bundle.
393
+ """
394
+ return list_patient_resources("Encounter", patient, _count, sort_desc_by_date=_sort == "-date")
395
+
396
+
397
+ def main() -> None:
398
+ """Run the mock FHIR API server.
399
+
400
+ Args:
401
+ None: Reads host/port from environment variables.
402
+
403
+ Returns:
404
+ None: Starts uvicorn process.
405
+ """
406
+ host = os.getenv("MOCK_FHIR_HOST", "0.0.0.0")
407
+ port = int(os.getenv("MOCK_FHIR_PORT", "8080"))
408
+ uvicorn.run("backend.fhir.mock_api:app", host=host, port=port, reload=False)
409
+
410
+
411
+ if __name__ == "__main__":
412
+ main()
backend/fhir/queries.py ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Deterministic FHIR query aggregation used as EHR fallback context retrieval."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import asyncio
6
+ from typing import Any
7
+
8
+ from backend.fhir.client import DEFAULT_FHIR_CLIENT
9
+ from backend.fhir.tools import (
10
+ get_allergies,
11
+ get_conditions,
12
+ get_diagnostic_reports,
13
+ get_medications,
14
+ get_observations,
15
+ get_recent_encounters,
16
+ search_patients,
17
+ )
18
+
19
+
20
+ async def get_full_patient_context(patient_id: str) -> dict[str, Any]:
21
+ """Aggregate all FHIR tool outputs for deterministic patient context assembly.
22
+
23
+ Args:
24
+ patient_id (str): Patient identifier.
25
+
26
+ Returns:
27
+ dict[str, Any]: Aggregated context containing all seven tool result sections.
28
+ """
29
+
30
+ (
31
+ patients,
32
+ conditions,
33
+ medications,
34
+ observations,
35
+ allergies,
36
+ diagnostic_reports,
37
+ encounters,
38
+ ) = await asyncio.gather(
39
+ search_patients(patient_id),
40
+ get_conditions(patient_id),
41
+ get_medications(patient_id),
42
+ get_observations(patient_id),
43
+ get_allergies(patient_id),
44
+ get_diagnostic_reports(patient_id),
45
+ get_recent_encounters(patient_id),
46
+ )
47
+
48
+ if not patients:
49
+ patient_resource = await DEFAULT_FHIR_CLIENT.get_patient(patient_id)
50
+ if patient_resource:
51
+ patients = [patient_resource]
52
+
53
+ return {
54
+ "patient_id": patient_id,
55
+ "patients": patients,
56
+ "conditions": conditions,
57
+ "medications": medications,
58
+ "observations": observations,
59
+ "allergies": allergies,
60
+ "diagnostic_reports": diagnostic_reports,
61
+ "encounters": encounters,
62
+ }
backend/fhir/tools.py ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """FHIR tool-call wrappers that extract resources from bundle responses."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Any
6
+
7
+ from backend.fhir.client import DEFAULT_FHIR_CLIENT
8
+
9
+
10
+ def _bundle_entries(bundle: dict[str, Any]) -> list[dict[str, Any]]:
11
+ """Extract resource rows from a FHIR Bundle payload.
12
+
13
+ Args:
14
+ bundle (dict[str, Any]): Raw FHIR Bundle response.
15
+
16
+ Returns:
17
+ list[dict[str, Any]]: Resource dictionaries found in bundle entries.
18
+ """
19
+
20
+ entries = bundle.get("entry", []) if isinstance(bundle, dict) else []
21
+ return [item.get("resource", {}) for item in entries if isinstance(item, dict) and isinstance(item.get("resource"), dict)]
22
+
23
+
24
+ async def search_patients(name: str) -> list[dict[str, Any]]:
25
+ """Search patient resources by name.
26
+
27
+ Args:
28
+ name (str): Name term for patient search.
29
+
30
+ Returns:
31
+ list[dict[str, Any]]: Matching Patient resources.
32
+ """
33
+
34
+ response = await DEFAULT_FHIR_CLIENT.search_patients(name=name)
35
+ return _bundle_entries(response)
36
+
37
+
38
+ async def get_conditions(patient_id: str) -> list[dict[str, Any]]:
39
+ """Return active condition resources for a patient.
40
+
41
+ Args:
42
+ patient_id (str): Patient identifier.
43
+
44
+ Returns:
45
+ list[dict[str, Any]]: Condition resources.
46
+ """
47
+
48
+ response = await DEFAULT_FHIR_CLIENT.get_conditions(patient_id=patient_id)
49
+ return _bundle_entries(response)
50
+
51
+
52
+ async def get_medications(patient_id: str) -> list[dict[str, Any]]:
53
+ """Return active medication request resources for a patient.
54
+
55
+ Args:
56
+ patient_id (str): Patient identifier.
57
+
58
+ Returns:
59
+ list[dict[str, Any]]: MedicationRequest resources.
60
+ """
61
+
62
+ response = await DEFAULT_FHIR_CLIENT.get_medications(patient_id=patient_id)
63
+ return _bundle_entries(response)
64
+
65
+
66
+ async def get_observations(patient_id: str, category: str = "laboratory") -> list[dict[str, Any]]:
67
+ """Return observation resources for a patient and category.
68
+
69
+ Args:
70
+ patient_id (str): Patient identifier.
71
+ category (str): Observation category code.
72
+
73
+ Returns:
74
+ list[dict[str, Any]]: Observation resources.
75
+ """
76
+
77
+ response = await DEFAULT_FHIR_CLIENT.get_observations(patient_id=patient_id, category=category)
78
+ return _bundle_entries(response)
79
+
80
+
81
+ async def get_allergies(patient_id: str) -> list[dict[str, Any]]:
82
+ """Return allergy resources for a patient.
83
+
84
+ Args:
85
+ patient_id (str): Patient identifier.
86
+
87
+ Returns:
88
+ list[dict[str, Any]]: AllergyIntolerance resources.
89
+ """
90
+
91
+ response = await DEFAULT_FHIR_CLIENT.get_allergies(patient_id=patient_id)
92
+ return _bundle_entries(response)
93
+
94
+
95
+ async def get_diagnostic_reports(patient_id: str) -> list[dict[str, Any]]:
96
+ """Return diagnostic report resources for a patient.
97
+
98
+ Args:
99
+ patient_id (str): Patient identifier.
100
+
101
+ Returns:
102
+ list[dict[str, Any]]: DiagnosticReport resources.
103
+ """
104
+
105
+ response = await DEFAULT_FHIR_CLIENT.get_diagnostic_reports(patient_id=patient_id)
106
+ return _bundle_entries(response)
107
+
108
+
109
+ async def get_recent_encounters(patient_id: str) -> list[dict[str, Any]]:
110
+ """Return recent encounter resources for a patient.
111
+
112
+ Args:
113
+ patient_id (str): Patient identifier.
114
+
115
+ Returns:
116
+ list[dict[str, Any]]: Encounter resources.
117
+ """
118
+
119
+ response = await DEFAULT_FHIR_CLIENT.get_recent_encounters(patient_id=patient_id)
120
+ return _bundle_entries(response)
backend/models/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Model wrappers for Clarke backend pipelines."""
backend/models/doc_generator.py ADDED
@@ -0,0 +1,342 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """MedGemma 27B document generation wrapper with prompt rendering and section parsing."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import json
6
+ import re
7
+ import time
8
+ from datetime import datetime, timezone
9
+ from pathlib import Path
10
+ from typing import Any
11
+
12
+ from jinja2 import Environment, FileSystemLoader
13
+
14
+ try:
15
+ import torch
16
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
17
+ except ModuleNotFoundError: # pragma: no cover - mock mode support
18
+ torch = None
19
+ AutoModelForCausalLM = None
20
+ AutoTokenizer = None
21
+ BitsAndBytesConfig = None
22
+
23
+ from backend.config import get_settings
24
+ from backend.errors import ModelExecutionError, get_component_logger
25
+ from backend.schemas import ClinicalDocument, ConsultationStatus, DocumentSection, PatientContext
26
+
27
+ logger = get_component_logger("doc_generator")
28
+
29
+ PROMPTS_DIR = Path("backend/prompts")
30
+ KNOWN_SECTION_HEADINGS = [
31
+ "History of presenting complaint",
32
+ "Examination findings",
33
+ "Investigation results",
34
+ "Assessment and plan",
35
+ "Current medications",
36
+ ]
37
+
38
+
39
+ class DocumentGenerator:
40
+ """Generate NHS clinic letters from transcript + patient context using MedGemma 27B.
41
+
42
+ Args:
43
+ model_id (str | None): Optional model identifier override.
44
+
45
+ Returns:
46
+ None: Initialises lazy model/tokenizer handles and runtime settings.
47
+ """
48
+
49
+ def __init__(self, model_id: str | None = None) -> None:
50
+ """Initialise document generator model wrapper state.
51
+
52
+ Args:
53
+ model_id (str | None): Optional model identifier override.
54
+
55
+ Returns:
56
+ None: Stores settings and lazy-loaded model state.
57
+ """
58
+
59
+ self.settings = get_settings()
60
+ self.model_id = model_id or self.settings.MEDGEMMA_27B_MODEL_ID
61
+ self._tokenizer: Any | None = None
62
+ self._model: Any | None = None
63
+ self.is_mock_mode = self.model_id.lower() == "mock"
64
+
65
+ def load_model(self) -> None:
66
+ """Load MedGemma 27B tokenizer and model in 4-bit quantised mode.
67
+
68
+ Args:
69
+ None: Reads class configuration and settings values.
70
+
71
+ Returns:
72
+ None: Populates model/tokenizer attributes or mock sentinel state.
73
+ """
74
+
75
+ if self.is_mock_mode:
76
+ logger.info("Document generator initialised in mock mode")
77
+ self._tokenizer = "mock"
78
+ self._model = "mock"
79
+ return
80
+ if self._model is not None and self._tokenizer is not None:
81
+ return
82
+
83
+ if AutoModelForCausalLM is None or AutoTokenizer is None or BitsAndBytesConfig is None or torch is None:
84
+ raise ModelExecutionError("transformers and torch are required for non-mock document generation")
85
+
86
+ try:
87
+ bnb_config = BitsAndBytesConfig(
88
+ load_in_4bit=True,
89
+ bnb_4bit_quant_type="nf4",
90
+ bnb_4bit_compute_dtype=torch.bfloat16,
91
+ bnb_4bit_use_double_quant=True,
92
+ )
93
+ self._tokenizer = AutoTokenizer.from_pretrained(self.model_id)
94
+ self._model = AutoModelForCausalLM.from_pretrained(
95
+ self.model_id,
96
+ quantization_config=bnb_config,
97
+ device_map="auto",
98
+ torch_dtype=torch.bfloat16,
99
+ )
100
+ logger.info("Loaded MedGemma document model", model_id=self.model_id)
101
+ except Exception as exc:
102
+ raise ModelExecutionError(f"Failed to load MedGemma 27B model: {exc}") from exc
103
+
104
+ def generate(self, prompt: str, max_new_tokens: int | None = None) -> str:
105
+ """Generate raw letter text from a rendered prompt.
106
+
107
+ Args:
108
+ prompt (str): Fully rendered prompt string.
109
+ max_new_tokens (int | None): Optional generation token cap override.
110
+
111
+ Returns:
112
+ str: Raw decoded generated text output from the model.
113
+ """
114
+
115
+ if self._model is None or self._tokenizer is None:
116
+ self.load_model()
117
+
118
+ if self.is_mock_mode:
119
+ return self._mock_reference_letter()
120
+
121
+ generation_max_tokens = max_new_tokens or self.settings.DOC_GEN_MAX_TOKENS
122
+ try:
123
+ inputs = self._tokenizer(prompt, return_tensors="pt")
124
+ if hasattr(self._model, "device"):
125
+ inputs = {key: value.to(self._model.device) for key, value in inputs.items()}
126
+ output_tokens = self._model.generate(
127
+ **inputs,
128
+ max_new_tokens=generation_max_tokens,
129
+ temperature=0.3,
130
+ top_p=0.9,
131
+ top_k=40,
132
+ do_sample=True,
133
+ repetition_penalty=1.1,
134
+ )
135
+ except Exception as exc:
136
+ raise ModelExecutionError(f"MedGemma 27B inference failed: {exc}") from exc
137
+
138
+ decoded_output = self._tokenizer.decode(output_tokens[0], skip_special_tokens=True)
139
+ return self._strip_prompt_prefix(decoded_output, prompt)
140
+
141
+ def generate_document(
142
+ self,
143
+ transcript: str,
144
+ context: PatientContext,
145
+ max_new_tokens: int | None = None,
146
+ ) -> ClinicalDocument:
147
+ """Render prompt, generate text with retry policy, and build ClinicalDocument.
148
+
149
+ Args:
150
+ transcript (str): Consultation transcript text.
151
+ context (PatientContext): Structured patient context payload.
152
+ max_new_tokens (int | None): Optional generation token limit override.
153
+
154
+ Returns:
155
+ ClinicalDocument: Parsed clinical letter representation with section objects.
156
+ """
157
+
158
+ prompt = self._render_prompt(transcript, context)
159
+ generation_start = time.perf_counter()
160
+
161
+ last_error: Exception | None = None
162
+ first_limit = max_new_tokens or 2048
163
+ retry_limit = max(256, first_limit // 2)
164
+ for attempt, token_limit in enumerate((first_limit, retry_limit), start=1):
165
+ try:
166
+ generated_text = self.generate(prompt, max_new_tokens=token_limit)
167
+ sections = self._parse_sections(generated_text)
168
+ if len(sections) < 4:
169
+ raise ValueError("Generated output did not contain enough parseable sections")
170
+ generation_time_s = round(time.perf_counter() - generation_start, 3)
171
+ return self._build_document(context, sections, generation_time_s)
172
+ except Exception as exc: # noqa: BLE001
173
+ last_error = exc
174
+ logger.warning("Document generation attempt failed", attempt=attempt, error=str(exc))
175
+
176
+ raise ModelExecutionError(f"Document generation failed after retry: {last_error}")
177
+
178
+ def _render_prompt(self, transcript: str, context: PatientContext) -> str:
179
+ """Render the document generation Jinja2 template with consultation inputs.
180
+
181
+ Args:
182
+ transcript (str): Consultation transcript text.
183
+ context (PatientContext): Structured patient context data.
184
+
185
+ Returns:
186
+ str: Rendered prompt string supplied to the language model.
187
+ """
188
+
189
+ env = Environment(loader=FileSystemLoader(PROMPTS_DIR))
190
+ template = env.get_template("document_generation.j2")
191
+ context_json = json.dumps(context.model_dump(mode="json"), ensure_ascii=False, indent=2)
192
+ return template.render(
193
+ letter_date=datetime.now(tz=timezone.utc).strftime("%d %b %Y"),
194
+ clinician_name="Dr. Sarah Chen",
195
+ clinician_title="Consultant Diabetologist",
196
+ transcript=transcript,
197
+ context_json=context_json,
198
+ )
199
+
200
+ @staticmethod
201
+ def _parse_sections(generated_text: str) -> list[DocumentSection]:
202
+ """Parse generated letter text into section objects using heading detection rules.
203
+
204
+ Args:
205
+ generated_text (str): Raw generated letter text.
206
+
207
+ Returns:
208
+ list[DocumentSection]: Ordered parsed sections with heading and content fields.
209
+ """
210
+
211
+ section_pattern = re.compile(
212
+ r"^(?:\*\*|##\s*)?(History of presenting complaint|Examination findings|Investigation results|Assessment and plan|Current medications)[:\*\s]*$",
213
+ flags=re.IGNORECASE,
214
+ )
215
+ sections: list[DocumentSection] = []
216
+ current_heading: str | None = None
217
+ current_lines: list[str] = []
218
+
219
+ for raw_line in generated_text.splitlines():
220
+ line = raw_line.strip()
221
+ heading_match = section_pattern.match(line)
222
+ if heading_match:
223
+ if current_heading and current_lines:
224
+ sections.append(
225
+ DocumentSection(
226
+ heading=current_heading,
227
+ content="\n".join(current_lines).strip(),
228
+ editable=True,
229
+ fhir_sources=[],
230
+ )
231
+ )
232
+ current_heading = heading_match.group(1).strip().title()
233
+ current_lines = []
234
+ continue
235
+
236
+ if current_heading and line:
237
+ current_lines.append(line)
238
+
239
+ if current_heading and current_lines:
240
+ sections.append(
241
+ DocumentSection(
242
+ heading=current_heading,
243
+ content="\n".join(current_lines).strip(),
244
+ editable=True,
245
+ fhir_sources=[],
246
+ )
247
+ )
248
+
249
+ if not sections:
250
+ sections = [
251
+ DocumentSection(
252
+ heading=heading,
253
+ content="Content unavailable in generated output.",
254
+ editable=True,
255
+ fhir_sources=[],
256
+ )
257
+ for heading in KNOWN_SECTION_HEADINGS
258
+ ]
259
+ return sections
260
+
261
+ @staticmethod
262
+ def _build_document(
263
+ context: PatientContext,
264
+ sections: list[DocumentSection],
265
+ generation_time_s: float,
266
+ ) -> ClinicalDocument:
267
+ """Construct ClinicalDocument schema object from parsed generation outputs.
268
+
269
+ Args:
270
+ context (PatientContext): Structured patient context data.
271
+ sections (list[DocumentSection]): Parsed generated document sections.
272
+ generation_time_s (float): Generation wall-clock duration in seconds.
273
+
274
+ Returns:
275
+ ClinicalDocument: Final structured letter ready for API response.
276
+ """
277
+
278
+ demographics = context.demographics
279
+ patient_name = str(demographics.get("name", "Unknown patient"))
280
+ patient_dob = str(demographics.get("dob", ""))
281
+ nhs_number = str(demographics.get("nhs_number", ""))
282
+ medications_list = [med.get("name", "") for med in context.medications if med.get("name")]
283
+
284
+ return ClinicalDocument(
285
+ consultation_id=context.patient_id,
286
+ letter_date=datetime.now(tz=timezone.utc).strftime("%Y-%m-%d"),
287
+ patient_name=patient_name,
288
+ patient_dob=patient_dob,
289
+ nhs_number=nhs_number,
290
+ addressee="GP Practice",
291
+ salutation="Dear Dr.,",
292
+ sections=sections,
293
+ medications_list=medications_list,
294
+ sign_off="Dr. Sarah Chen, Consultant Diabetologist",
295
+ status=ConsultationStatus.REVIEW,
296
+ generated_at=datetime.now(tz=timezone.utc).isoformat(),
297
+ generation_time_s=generation_time_s,
298
+ discrepancies=[],
299
+ )
300
+
301
+ @staticmethod
302
+ def _strip_prompt_prefix(decoded_output: str, prompt: str) -> str:
303
+ """Remove prompt text prefix when decoder echoes the full prompt + completion.
304
+
305
+ Args:
306
+ decoded_output (str): Tokenizer-decoded text from model output ids.
307
+ prompt (str): Original model prompt string.
308
+
309
+ Returns:
310
+ str: Completion-only output when prompt prefix is present.
311
+ """
312
+
313
+ if decoded_output.startswith(prompt):
314
+ return decoded_output[len(prompt) :].strip()
315
+ return decoded_output.strip()
316
+
317
+ @staticmethod
318
+ def _mock_reference_letter() -> str:
319
+ """Return deterministic reference letter text for mock mode generation.
320
+
321
+ Args:
322
+ None: Uses embedded fixture text.
323
+
324
+ Returns:
325
+ str: Structured letter text with known section headings.
326
+ """
327
+
328
+ return (
329
+ "History of presenting complaint\n"
330
+ "Mrs Thompson reported worsening fatigue and reduced exercise tolerance over the last three months. "
331
+ "She confirmed she is taking metformin and gliclazide but occasionally misses evening doses.\n\n"
332
+ "Examination findings\n"
333
+ "No acute distress was described during the consultation. She denied chest pain, syncope, or focal neurological symptoms.\n\n"
334
+ "Investigation results\n"
335
+ "Recent blood results showed HbA1c 55 mmol/mol with eGFR 52 mL/min/1.73m². "
336
+ "Penicillin allergy with previous anaphylaxis was reconfirmed.\n\n"
337
+ "Assessment and plan\n"
338
+ "Overall picture is suboptimal glycaemic control with associated fatigue. Plan is lifestyle reinforcement, medicine adherence review, "
339
+ "repeat renal profile in 3 months, and consideration of treatment escalation if HbA1c remains above target.\n\n"
340
+ "Current medications\n"
341
+ "Metformin 1 g twice daily; Gliclazide 80 mg twice daily."
342
+ )
backend/models/ehr_agent.py ADDED
@@ -0,0 +1,459 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """EHR Agent module for deterministic FHIR retrieval and MedGemma 4B summarisation."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import asyncio
6
+ import json
7
+ import re
8
+ from datetime import date, datetime, timezone
9
+ from pathlib import Path
10
+ from typing import Any
11
+
12
+ from jinja2 import Template
13
+ from pydantic import ValidationError
14
+
15
+ try:
16
+ import torch
17
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
18
+ except ModuleNotFoundError: # pragma: no cover - mock mode support
19
+ torch = None
20
+ AutoModelForCausalLM = None
21
+ AutoTokenizer = None
22
+ BitsAndBytesConfig = None
23
+
24
+ from backend.config import get_settings
25
+ from backend.errors import ModelExecutionError, get_component_logger
26
+ from backend.fhir.queries import get_full_patient_context
27
+ from backend.schemas import LabResult, PatientContext
28
+
29
+ logger = get_component_logger("ehr_agent")
30
+
31
+
32
+ DEFAULT_CONTEXT_PROMPT = """You are a clinical EHR synthesis agent.
33
+ Given the raw FHIR payload below, output only valid JSON for this schema:
34
+ {
35
+ \"patient_id\": \"...\",
36
+ \"demographics\": {...},
37
+ \"problem_list\": [\"...\"],
38
+ \"medications\": [{...}],
39
+ \"allergies\": [{...}],
40
+ \"recent_labs\": [{...}],
41
+ \"recent_imaging\": [{...}],
42
+ \"clinical_flags\": [\"...\"],
43
+ \"last_letter_excerpt\": null,
44
+ \"retrieval_warnings\": [],
45
+ \"retrieved_at\": \"...\"
46
+ }
47
+
48
+ Raw FHIR context JSON:
49
+ {{ raw_context_json }}
50
+ """
51
+
52
+
53
+ def parse_agent_output(raw_output: str) -> dict[str, Any]:
54
+ """Extract first JSON object from MedGemma output after stripping prompt leaks.
55
+
56
+ Args:
57
+ raw_output (str): Raw model generation text that may contain extra formatting.
58
+
59
+ Returns:
60
+ dict[str, Any]: Parsed JSON object extracted from model output.
61
+ """
62
+
63
+ cleaned_output = re.sub(r"<\|system\|>.*?<\|end\|>", "", raw_output, flags=re.DOTALL)
64
+ cleaned_output = re.sub(r"```json\s*", "", cleaned_output)
65
+ cleaned_output = re.sub(r"```\s*", "", cleaned_output)
66
+ match = re.search(r"\{[\s\S]*\}", cleaned_output)
67
+ if match:
68
+ return json.loads(match.group())
69
+ raise ValueError("No valid JSON found in agent output")
70
+
71
+
72
+ class EHRAgent:
73
+ """Retrieve raw FHIR context and synthesise a validated PatientContext object.
74
+
75
+ Args:
76
+ model_id (str | None): Optional override for MedGemma 4B model ID.
77
+
78
+ Returns:
79
+ None: Creates an agent instance with lazy model loading.
80
+ """
81
+
82
+ def __init__(self, model_id: str | None = None) -> None:
83
+ settings = get_settings()
84
+ self.model_id = model_id or settings.MEDGEMMA_4B_MODEL_ID
85
+ self.timeout_s = settings.FHIR_TIMEOUT_S
86
+ self._model: Any | None = None
87
+ self._tokenizer: Any | None = None
88
+ self.is_mock_mode = self.model_id.lower() == "mock"
89
+
90
+ def load_model(self) -> None:
91
+ """Load the MedGemma 4B model/tokenizer in 4-bit mode unless running in mock mode.
92
+
93
+ Args:
94
+ None: Uses configured model ID and quantisation settings.
95
+
96
+ Returns:
97
+ None: Populates tokenizer/model attributes for inference.
98
+ """
99
+
100
+ if self.is_mock_mode:
101
+ logger.info("EHR agent initialised in mock mode")
102
+ return
103
+ if self._model is not None and self._tokenizer is not None:
104
+ return
105
+
106
+ if AutoModelForCausalLM is None or AutoTokenizer is None or BitsAndBytesConfig is None or torch is None:
107
+ raise ModelExecutionError("transformers and torch are required for non-mock EHR mode")
108
+
109
+ try:
110
+ bnb_config = BitsAndBytesConfig(
111
+ load_in_4bit=True,
112
+ bnb_4bit_quant_type="nf4",
113
+ bnb_4bit_compute_dtype=torch.bfloat16,
114
+ bnb_4bit_use_double_quant=True,
115
+ )
116
+ self._tokenizer = AutoTokenizer.from_pretrained(self.model_id)
117
+ self._model = AutoModelForCausalLM.from_pretrained(
118
+ self.model_id,
119
+ quantization_config=bnb_config,
120
+ device_map="auto",
121
+ torch_dtype=torch.bfloat16,
122
+ )
123
+ logger.info("Loaded EHR agent model", model_id=self.model_id)
124
+ except Exception as exc:
125
+ raise ModelExecutionError(f"Failed to load MedGemma EHR model: {exc}") from exc
126
+
127
+ def get_patient_context(self, patient_id: str) -> PatientContext:
128
+ """Return structured patient context using deterministic FHIR retrieval with robust fallback.
129
+
130
+ Args:
131
+ patient_id (str): Patient identifier used across FHIR resources.
132
+
133
+ Returns:
134
+ PatientContext: Validated patient context instance for downstream pipeline use.
135
+ """
136
+
137
+ raw_context = asyncio.run(get_full_patient_context(patient_id))
138
+
139
+ if self.is_mock_mode:
140
+ return self._build_context_from_raw(raw_context)
141
+
142
+ self.load_model()
143
+ for attempt in range(1, 3):
144
+ try:
145
+ summarised_context = self._summarise_with_model(raw_context)
146
+ return PatientContext.model_validate(summarised_context)
147
+ except (ValidationError, ValueError, ModelExecutionError, json.JSONDecodeError) as exc:
148
+ logger.warning(
149
+ "EHR model summarisation failed; retrying or falling back",
150
+ patient_id=patient_id,
151
+ attempt=attempt,
152
+ error=str(exc),
153
+ )
154
+
155
+ fallback_context = self._build_context_from_raw(raw_context)
156
+ fallback_context.retrieval_warnings.append(
157
+ "MedGemma summarisation unavailable; context built via deterministic extraction."
158
+ )
159
+ return fallback_context
160
+
161
+ def _summarise_with_model(self, raw_context: dict[str, Any]) -> dict[str, Any]:
162
+ """Run MedGemma generation and parse into a dictionary payload.
163
+
164
+ Args:
165
+ raw_context (dict[str, Any]): Deterministic FHIR aggregation from tool queries.
166
+
167
+ Returns:
168
+ dict[str, Any]: Parsed context dictionary extracted from model output JSON.
169
+ """
170
+
171
+ if self._model is None or self._tokenizer is None:
172
+ raise ModelExecutionError("Model and tokenizer must be loaded before inference")
173
+
174
+ prompt = self._render_context_prompt(raw_context)
175
+ inputs = self._tokenizer(prompt, return_tensors="pt")
176
+ if hasattr(self._model, "device"):
177
+ inputs = {key: value.to(self._model.device) for key, value in inputs.items()}
178
+
179
+ try:
180
+ output_tokens = self._model.generate(
181
+ **inputs,
182
+ max_new_tokens=1024,
183
+ do_sample=True,
184
+ temperature=0.2,
185
+ top_p=0.9,
186
+ repetition_penalty=1.1,
187
+ )
188
+ except Exception as exc:
189
+ raise ModelExecutionError(f"MedGemma EHR generation failed: {exc}") from exc
190
+
191
+ raw_output = self._tokenizer.decode(output_tokens[0], skip_special_tokens=True)
192
+ return parse_agent_output(raw_output)
193
+
194
+ def _render_context_prompt(self, raw_context: dict[str, Any]) -> str:
195
+ """Render context synthesis prompt from Jinja template or built-in fallback template.
196
+
197
+ Args:
198
+ raw_context (dict[str, Any]): Deterministic FHIR context dictionary.
199
+
200
+ Returns:
201
+ str: Prompt text for MedGemma summarisation.
202
+ """
203
+
204
+ prompt_template_path = Path("backend/prompts/context_synthesis.j2")
205
+ if prompt_template_path.exists():
206
+ template_text = prompt_template_path.read_text(encoding="utf-8")
207
+ else:
208
+ template_text = DEFAULT_CONTEXT_PROMPT
209
+
210
+ template = Template(template_text)
211
+ return template.render(raw_context_json=json.dumps(raw_context, ensure_ascii=False, indent=2))
212
+
213
+ def _build_context_from_raw(self, raw_context: dict[str, Any]) -> PatientContext:
214
+ """Construct PatientContext directly from raw FHIR resources as deterministic fallback.
215
+
216
+ Args:
217
+ raw_context (dict[str, Any]): Deterministic FHIR context dictionary.
218
+
219
+ Returns:
220
+ PatientContext: Fully structured context generated without model summarisation.
221
+ """
222
+
223
+ patient = raw_context.get("patients", [{}])[0] if raw_context.get("patients") else {}
224
+ demographics = self._extract_demographics(patient)
225
+ problem_list = self._extract_problem_list(raw_context.get("conditions", []))
226
+ medications = self._extract_medications(raw_context.get("medications", []))
227
+ allergies = self._extract_allergies(raw_context.get("allergies", []))
228
+ recent_labs = self._extract_labs(raw_context.get("observations", []))
229
+ recent_imaging = self._extract_imaging(raw_context.get("diagnostic_reports", []))
230
+
231
+ clinical_flags: list[str] = []
232
+ hba1c_values = [lab for lab in recent_labs if lab.name.lower() == "hba1c"]
233
+ if len(hba1c_values) >= 2:
234
+ latest = float(hba1c_values[0].value)
235
+ previous = float(hba1c_values[1].value)
236
+ if latest > previous:
237
+ clinical_flags.append(f"HbA1c rising trend ({hba1c_values[1].value} → {hba1c_values[0].value})")
238
+
239
+ return PatientContext(
240
+ patient_id=str(raw_context.get("patient_id", "")),
241
+ demographics=demographics,
242
+ problem_list=problem_list,
243
+ medications=medications,
244
+ allergies=allergies,
245
+ recent_labs=recent_labs,
246
+ recent_imaging=recent_imaging,
247
+ clinical_flags=clinical_flags,
248
+ last_letter_excerpt=None,
249
+ retrieval_warnings=[],
250
+ retrieved_at=datetime.now(tz=timezone.utc).isoformat(),
251
+ )
252
+
253
+ @staticmethod
254
+ def _extract_demographics(patient: dict[str, Any]) -> dict[str, Any]:
255
+ """Extract demographics map from FHIR Patient resource.
256
+
257
+ Args:
258
+ patient (dict[str, Any]): FHIR Patient resource dictionary.
259
+
260
+ Returns:
261
+ dict[str, Any]: Simplified demographics payload.
262
+ """
263
+
264
+ names = patient.get("name", [])
265
+ first_name = names[0] if names else {}
266
+ full_name_parts = first_name.get("prefix", []) + first_name.get("given", []) + [first_name.get("family", "")]
267
+ full_name = " ".join(part for part in full_name_parts if part).strip()
268
+
269
+ nhs_number = ""
270
+ for identifier in patient.get("identifier", []):
271
+ if "nhs" in str(identifier.get("system", "")).lower() or identifier.get("value"):
272
+ nhs_number = str(identifier.get("value", ""))
273
+ if nhs_number:
274
+ break
275
+
276
+ birth_date_value = patient.get("birthDate", "")
277
+ birth_date_raw = str(birth_date_value).strip() if birth_date_value is not None else ""
278
+ dob_display = birth_date_raw
279
+ age: int | None = None
280
+ if birth_date_raw and birth_date_raw != 'None':
281
+ try:
282
+ parsed_dob = date.fromisoformat(birth_date_raw)
283
+ dob_display = parsed_dob.strftime("%d/%m/%Y")
284
+ today = date.today()
285
+ age = today.year - parsed_dob.year - ((today.month, today.day) < (parsed_dob.month, parsed_dob.day))
286
+ except ValueError:
287
+ dob_display = birth_date_raw
288
+
289
+ nhs_clean = "".join(ch for ch in nhs_number if ch.isdigit())
290
+ if len(nhs_clean) == 10:
291
+ nhs_number = f"{nhs_clean[:3]}-{nhs_clean[3:6]}-{nhs_clean[6:]}"
292
+
293
+ return {
294
+ "name": full_name,
295
+ "dob": dob_display,
296
+ "nhs_number": nhs_number,
297
+ "age": age,
298
+ "sex": str(patient.get("gender", "")).capitalize(),
299
+ "address": "",
300
+ }
301
+
302
+ @staticmethod
303
+ def _extract_problem_list(conditions: list[dict[str, Any]]) -> list[str]:
304
+ """Extract active problem list from FHIR Condition resources.
305
+
306
+ Args:
307
+ conditions (list[dict[str, Any]]): List of FHIR Condition resources.
308
+
309
+ Returns:
310
+ list[str]: Human-readable problem entries.
311
+ """
312
+
313
+ problems: list[str] = []
314
+ for condition in conditions:
315
+ status_codes = condition.get("clinicalStatus", {}).get("coding", [])
316
+ is_active = any(code.get("code") == "active" for code in status_codes) if status_codes else True
317
+ label = str(condition.get("code", {}).get("text", "")).strip()
318
+ if is_active and label:
319
+ problems.append(label)
320
+ return problems
321
+
322
+ @staticmethod
323
+ def _extract_medications(medications: list[dict[str, Any]]) -> list[dict[str, Any]]:
324
+ """Extract simplified medication entries from MedicationRequest resources.
325
+
326
+ Args:
327
+ medications (list[dict[str, Any]]): List of FHIR MedicationRequest resources.
328
+
329
+ Returns:
330
+ list[dict[str, Any]]: Medication records with source IDs.
331
+ """
332
+
333
+ extracted: list[dict[str, Any]] = []
334
+ for medication in medications:
335
+ dosage_text = ""
336
+ dosage_instructions = medication.get("dosageInstruction", [])
337
+ if dosage_instructions:
338
+ dosage_text = str(dosage_instructions[0].get("text", "")).strip()
339
+
340
+ dose = ""
341
+ frequency = ""
342
+ if dosage_text:
343
+ parts = dosage_text.rsplit(" ", maxsplit=1)
344
+ if len(parts) == 2:
345
+ dose, frequency = parts[0], parts[1]
346
+ else:
347
+ dose = dosage_text
348
+
349
+ extracted.append(
350
+ {
351
+ "name": str(medication.get("medicationCodeableConcept", {}).get("text", "")).strip(),
352
+ "dose": dose,
353
+ "frequency": frequency,
354
+ "fhir_id": str(medication.get("id", "")),
355
+ }
356
+ )
357
+ return extracted
358
+
359
+ @staticmethod
360
+ def _extract_allergies(allergies: list[dict[str, Any]]) -> list[dict[str, Any]]:
361
+ """Extract allergy summary records from AllergyIntolerance resources.
362
+
363
+ Args:
364
+ allergies (list[dict[str, Any]]): List of FHIR AllergyIntolerance resources.
365
+
366
+ Returns:
367
+ list[dict[str, Any]]: Simplified allergy entries.
368
+ """
369
+
370
+ extracted: list[dict[str, Any]] = []
371
+ for allergy in allergies:
372
+ reaction = ""
373
+ reactions = allergy.get("reaction", [])
374
+ if reactions:
375
+ manifestations = reactions[0].get("manifestation", [])
376
+ if manifestations:
377
+ reaction = str(manifestations[0].get("text", ""))
378
+ extracted.append(
379
+ {
380
+ "substance": str(allergy.get("code", {}).get("text", "")).strip(),
381
+ "reaction": reaction,
382
+ "severity": str(allergy.get("criticality", "")).strip() or "unknown",
383
+ }
384
+ )
385
+ return extracted
386
+
387
+ @staticmethod
388
+ def _extract_labs(observations: list[dict[str, Any]]) -> list[LabResult]:
389
+ """Extract laboratory results from Observation resources with simple trend linkage.
390
+
391
+ Args:
392
+ observations (list[dict[str, Any]]): List of FHIR Observation resources.
393
+
394
+ Returns:
395
+ list[LabResult]: Structured laboratory results sorted by effective date.
396
+ """
397
+
398
+ labs: list[LabResult] = []
399
+ sorted_observations = sorted(
400
+ observations,
401
+ key=lambda obs: str(obs.get("effectiveDateTime", "")),
402
+ reverse=True,
403
+ )
404
+ previous_by_name: dict[str, LabResult] = {}
405
+
406
+ for observation in sorted_observations:
407
+ quantity = observation.get("valueQuantity", {})
408
+ name = str(observation.get("code", {}).get("text", "")).strip()
409
+ value = str(quantity.get("value", ""))
410
+ unit = str(quantity.get("unit", ""))
411
+ date = str(observation.get("effectiveDateTime", ""))
412
+ lab = LabResult(
413
+ name=name,
414
+ value=value,
415
+ unit=unit,
416
+ date=date,
417
+ fhir_resource_id=str(observation.get("id", "")),
418
+ )
419
+ if name in previous_by_name:
420
+ previous = previous_by_name[name]
421
+ lab.previous_value = previous.value
422
+ lab.previous_date = previous.date
423
+ try:
424
+ current_val = float(lab.value)
425
+ previous_val = float(previous.value)
426
+ if current_val > previous_val:
427
+ lab.trend = "rising"
428
+ elif current_val < previous_val:
429
+ lab.trend = "falling"
430
+ else:
431
+ lab.trend = "stable"
432
+ except (TypeError, ValueError):
433
+ lab.trend = None
434
+ previous_by_name[name] = lab
435
+ labs.append(lab)
436
+
437
+ return labs
438
+
439
+ @staticmethod
440
+ def _extract_imaging(reports: list[dict[str, Any]]) -> list[dict[str, Any]]:
441
+ """Extract concise imaging/report summaries from DiagnosticReport resources.
442
+
443
+ Args:
444
+ reports (list[dict[str, Any]]): List of FHIR DiagnosticReport resources.
445
+
446
+ Returns:
447
+ list[dict[str, Any]]: Recent report summary entries.
448
+ """
449
+
450
+ extracted: list[dict[str, Any]] = []
451
+ for report in reports:
452
+ extracted.append(
453
+ {
454
+ "type": str(report.get("code", {}).get("text", "Diagnostic report")),
455
+ "date": str(report.get("effectiveDateTime", "")),
456
+ "summary": str(report.get("conclusion", "")).strip(),
457
+ }
458
+ )
459
+ return extracted
backend/models/medasr.py ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """MedASR model wrapper with mock-mode fallback transcription."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from datetime import datetime, timezone
6
+ from pathlib import Path
7
+
8
+ import librosa
9
+ try:
10
+ from transformers import pipeline
11
+ except ModuleNotFoundError: # pragma: no cover - mock mode support
12
+ pipeline = None
13
+
14
+ from backend.config import get_settings
15
+ from backend.errors import ModelExecutionError
16
+ from backend.models.model_manager import ModelManager
17
+ from backend.schemas import Transcript
18
+
19
+
20
+ class MedASRModel:
21
+ """Load and run MedASR speech recognition or a deterministic mock implementation.
22
+
23
+ Args:
24
+ model_manager (ModelManager | None): Optional shared model registry manager.
25
+
26
+ Returns:
27
+ None: Initialised model wrapper with lazy-loaded pipeline.
28
+ """
29
+
30
+ def __init__(self, model_manager: ModelManager | None = None) -> None:
31
+ """Initialise MedASR wrapper.
32
+
33
+ Args:
34
+ model_manager (ModelManager | None): Optional model manager instance.
35
+
36
+ Returns:
37
+ None: Sets internal settings and model state.
38
+ """
39
+ self.settings = get_settings()
40
+ self.model_manager = model_manager or ModelManager()
41
+ self._pipeline = None
42
+
43
+ @property
44
+ def is_mock_mode(self) -> bool:
45
+ """Return whether MedASR should operate in deterministic mock mode.
46
+
47
+ Args:
48
+ None: Reads current settings.
49
+
50
+ Returns:
51
+ bool: True when configured model id is "mock".
52
+ """
53
+ return self.settings.MEDASR_MODEL_ID.lower() == "mock"
54
+
55
+ def load_model(self) -> None:
56
+ """Load the MedASR transformer pipeline unless running in mock mode.
57
+
58
+ Args:
59
+ None: Uses settings for model id and device selection.
60
+
61
+ Returns:
62
+ None: Caches loaded pipeline instance.
63
+ """
64
+ if self.is_mock_mode:
65
+ self._pipeline = "mock"
66
+ self.model_manager.register_model("medasr", self._pipeline)
67
+ return
68
+
69
+ if self._pipeline is not None:
70
+ return
71
+
72
+ if pipeline is None:
73
+ raise ModelExecutionError("transformers is required for non-mock MedASR mode")
74
+
75
+ device = "cuda:0"
76
+ if self.model_manager.check_gpu()["vram_total_bytes"] == 0:
77
+ device = "cpu"
78
+
79
+ try:
80
+ self._pipeline = pipeline(
81
+ "automatic-speech-recognition",
82
+ model=self.settings.MEDASR_MODEL_ID,
83
+ device=device,
84
+ )
85
+ except Exception as exc:
86
+ raise ModelExecutionError(f"Failed to load MedASR model: {exc}") from exc
87
+
88
+ self.model_manager.register_model("medasr", self._pipeline)
89
+
90
+ def transcribe(self, audio_path: str) -> Transcript:
91
+ """Transcribe audio input into a Transcript schema object.
92
+
93
+ Args:
94
+ audio_path (str): Path to 16kHz mono audio WAV.
95
+
96
+ Returns:
97
+ Transcript: Structured transcript result.
98
+ """
99
+ source = Path(audio_path)
100
+ if not source.exists():
101
+ raise ModelExecutionError(f"Audio path not found: {source}")
102
+
103
+ if self._pipeline is None:
104
+ self.load_model()
105
+
106
+ if self.is_mock_mode:
107
+ text = self._get_mock_text(source)
108
+ duration_s = self._duration(source)
109
+ return self._make_transcript(source, text, duration_s)
110
+
111
+ waveform, _ = librosa.load(source, sr=16000, mono=True)
112
+ duration_s = float(librosa.get_duration(y=waveform, sr=16000))
113
+
114
+ try:
115
+ result = self._pipeline(
116
+ waveform,
117
+ chunk_length_s=20,
118
+ stride_length_s=(4, 2),
119
+ return_timestamps=True,
120
+ generate_kwargs={"language": "en", "task": "transcribe"},
121
+ )
122
+ except Exception as exc:
123
+ raise ModelExecutionError(f"MedASR inference failed: {exc}") from exc
124
+
125
+ transcript_text = str(result.get("text", "")).strip()
126
+ return self._make_transcript(source, transcript_text, duration_s)
127
+
128
+ def _make_transcript(self, audio_path: Path, text: str, duration_s: float) -> Transcript:
129
+ """Build a Transcript object from model output values.
130
+
131
+ Args:
132
+ audio_path (Path): Source audio path.
133
+ text (str): Transcript text.
134
+ duration_s (float): Audio duration seconds.
135
+
136
+ Returns:
137
+ Transcript: Pydantic transcript model.
138
+ """
139
+ now = datetime.now(tz=timezone.utc).isoformat()
140
+ consultation_id = audio_path.stem
141
+ return Transcript(
142
+ consultation_id=consultation_id,
143
+ text=text,
144
+ duration_s=duration_s,
145
+ word_count=len(text.split()),
146
+ created_at=now,
147
+ )
148
+
149
+ @staticmethod
150
+ def _duration(audio_path: Path) -> float:
151
+ """Compute audio duration in seconds using librosa.
152
+
153
+ Args:
154
+ audio_path (Path): Audio file path.
155
+
156
+ Returns:
157
+ float: Duration in seconds.
158
+ """
159
+ waveform, sample_rate = librosa.load(audio_path, sr=16000, mono=True)
160
+ return float(librosa.get_duration(y=waveform, sr=sample_rate))
161
+
162
+ @staticmethod
163
+ def _get_mock_text(audio_path: Path) -> str:
164
+ """Return ground-truth transcript for known demo files in mock mode.
165
+
166
+ Args:
167
+ audio_path (Path): Audio file path used for lookup.
168
+
169
+ Returns:
170
+ str: Transcript text from fixture file or fallback placeholder.
171
+ """
172
+ transcript_map = {
173
+ "mrs_thompson": Path("data/demo/mrs_thompson_transcript.txt"),
174
+ "mr_okafor": Path("data/demo/mr_okafor_transcript.txt"),
175
+ "ms_patel": Path("data/demo/ms_patel_transcript.txt"),
176
+ }
177
+ for key, transcript_path in transcript_map.items():
178
+ if key in audio_path.stem:
179
+ return transcript_path.read_text(encoding="utf-8").strip()
180
+ return "Mock transcript placeholder for non-demo audio input."
backend/models/model_manager.py ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Shared model lifecycle management utilities."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Any
6
+
7
+ try:
8
+ import torch
9
+ except ModuleNotFoundError: # pragma: no cover - lightweight environments
10
+ torch = None
11
+
12
+
13
+ class ModelManager:
14
+ """Track loaded model objects and expose GPU/cache health helpers.
15
+
16
+ Args:
17
+ None: Manager starts with an empty registry.
18
+
19
+ Returns:
20
+ None: Instance stores mutable model registry state.
21
+ """
22
+
23
+ def __init__(self) -> None:
24
+ """Initialise model registry.
25
+
26
+ Args:
27
+ None: No constructor parameters.
28
+
29
+ Returns:
30
+ None: Creates an empty dictionary for loaded models.
31
+ """
32
+ self._models: dict[str, Any] = {}
33
+
34
+ def register_model(self, name: str, model: Any) -> None:
35
+ """Register a loaded model object under a unique name.
36
+
37
+ Args:
38
+ name (str): Registry key for the model.
39
+ model (Any): Loaded model or pipeline object.
40
+
41
+ Returns:
42
+ None: Updates internal model registry.
43
+ """
44
+ self._models[name] = model
45
+
46
+ def get_model(self, name: str) -> Any | None:
47
+ """Return a model object by name when present.
48
+
49
+ Args:
50
+ name (str): Registry key for the model.
51
+
52
+ Returns:
53
+ Any | None: Registered model object or None.
54
+ """
55
+ return self._models.get(name)
56
+
57
+ def clear_cache(self) -> None:
58
+ """Clear torch CUDA cache when GPU is available.
59
+
60
+ Args:
61
+ None: No parameters.
62
+
63
+ Returns:
64
+ None: Invokes CUDA cache clear side effects.
65
+ """
66
+ if torch is not None and torch.cuda.is_available():
67
+ torch.cuda.empty_cache()
68
+
69
+ def check_gpu(self) -> dict[str, str | int]:
70
+ """Return GPU name and VRAM usage metrics.
71
+
72
+ Args:
73
+ None: No parameters.
74
+
75
+ Returns:
76
+ dict[str, str | int]: gpu_name, vram_used_bytes, vram_total_bytes.
77
+ """
78
+ if torch is None or not torch.cuda.is_available():
79
+ return {
80
+ "gpu_name": "cpu-mock",
81
+ "vram_used_bytes": 0,
82
+ "vram_total_bytes": 0,
83
+ }
84
+
85
+ device_index = torch.cuda.current_device()
86
+ props = torch.cuda.get_device_properties(device_index)
87
+ return {
88
+ "gpu_name": props.name,
89
+ "vram_used_bytes": int(torch.cuda.memory_allocated(device_index)),
90
+ "vram_total_bytes": int(props.total_memory),
91
+ }
backend/orchestrator.py ADDED
@@ -0,0 +1,496 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Pipeline orchestrator coordinating transcription, context retrieval, and prompt assembly."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import json
6
+ import asyncio
7
+ import time
8
+ from datetime import datetime, timezone
9
+ from pathlib import Path
10
+ from uuid import uuid4
11
+
12
+ try:
13
+ import torch
14
+ except ModuleNotFoundError: # pragma: no cover - mock/lightweight runtime
15
+ torch = None
16
+
17
+
18
+ from backend.errors import ModelExecutionError, get_component_logger
19
+ from backend.audio import validate_audio
20
+ from backend.models.doc_generator import DocumentGenerator
21
+ from backend.models.ehr_agent import EHRAgent
22
+ from backend.models.medasr import MedASRModel
23
+ from backend.schemas import ClinicalDocument, Consultation, ConsultationStatus, Patient, PatientContext, PipelineProgress, PipelineStage
24
+
25
+ logger = get_component_logger("orchestrator")
26
+
27
+
28
+ class PipelineOrchestrator:
29
+ """Coordinate the consultation lifecycle across all model pipeline stages.
30
+
31
+ Args:
32
+ medasr_model (MedASRModel | None): Optional MedASR model wrapper instance.
33
+ ehr_agent (EHRAgent | None): Optional EHR agent instance.
34
+
35
+ Returns:
36
+ None: Initialises orchestrator state stores in memory.
37
+ """
38
+
39
+ def __init__(
40
+ self,
41
+ medasr_model: MedASRModel | None = None,
42
+ ehr_agent: EHRAgent | None = None,
43
+ doc_generator: DocumentGenerator | None = None,
44
+ ) -> None:
45
+ self.consultations: dict[str, Consultation] = {}
46
+ self.progress: dict[str, PipelineProgress] = {}
47
+ self._medasr_model = medasr_model or MedASRModel()
48
+ self._ehr_agent = ehr_agent or EHRAgent()
49
+ self._doc_generator = doc_generator or DocumentGenerator()
50
+
51
+ @staticmethod
52
+ def _clear_cuda_cache() -> None:
53
+ """Release cached CUDA memory between heavy model stages when CUDA is available.
54
+
55
+ Args:
56
+ None: Uses current torch CUDA runtime state.
57
+
58
+ Returns:
59
+ None: Performs in-place cache cleanup only.
60
+ """
61
+
62
+ if torch is not None and torch.cuda.is_available():
63
+ torch.cuda.empty_cache()
64
+
65
+ @staticmethod
66
+ def _timestamp() -> str:
67
+ """Return current UTC timestamp string for consultation lifecycle fields.
68
+
69
+ Args:
70
+ None: Reads current clock only.
71
+
72
+ Returns:
73
+ str: ISO timestamp with timezone.
74
+ """
75
+
76
+ return datetime.now(tz=timezone.utc).isoformat()
77
+
78
+ def start_consultation(self, patient: Patient) -> Consultation:
79
+ """Create and store a consultation in recording state.
80
+
81
+ Args:
82
+ patient (Patient): Patient selected for this consultation.
83
+
84
+ Returns:
85
+ Consultation: Newly created consultation object.
86
+ """
87
+
88
+ consultation_id = f"cons-{uuid4()}"
89
+ consultation = Consultation(
90
+ id=consultation_id,
91
+ patient=patient,
92
+ status=ConsultationStatus.RECORDING,
93
+ started_at=self._timestamp(),
94
+ )
95
+ self.consultations[consultation_id] = consultation
96
+ self.progress[consultation_id] = PipelineProgress(
97
+ consultation_id=consultation_id,
98
+ stage=PipelineStage.RETRIEVING_CONTEXT,
99
+ progress_pct=5,
100
+ message="Consultation started. Recording in progress.",
101
+ )
102
+ return consultation
103
+
104
+ def prefetch_context(self, consultation_id: str) -> None:
105
+ """Preload patient context for a consultation to reduce end-stage latency.
106
+
107
+ Args:
108
+ consultation_id (str): Consultation identifier.
109
+
110
+ Returns:
111
+ None: Updates consultation context cache in place.
112
+ """
113
+
114
+ consultation = self.get_consultation(consultation_id)
115
+ if consultation.context is not None:
116
+ return
117
+
118
+ try:
119
+ consultation.context = self._ehr_agent.get_patient_context(consultation.patient.id)
120
+ self.progress[consultation_id] = PipelineProgress(
121
+ consultation_id=consultation_id,
122
+ stage=PipelineStage.RETRIEVING_CONTEXT,
123
+ progress_pct=25,
124
+ message="Patient context prefetched.",
125
+ )
126
+ except Exception as exc:
127
+ logger.warning("Context prefetch failed", consultation_id=consultation_id, error=str(exc))
128
+
129
+ def end_consultation(self, consultation_id: str) -> Consultation:
130
+ """Finalize recording and run the staged processing pipeline.
131
+
132
+ Args:
133
+ consultation_id (str): Consultation identifier.
134
+
135
+ Returns:
136
+ Consultation: Updated consultation after pipeline completion.
137
+ """
138
+
139
+ try:
140
+ return asyncio.run(
141
+ asyncio.wait_for(
142
+ asyncio.to_thread(self._run_pipeline, consultation_id),
143
+ timeout=float(self._doc_generator.settings.PIPELINE_TIMEOUT_S),
144
+ )
145
+ )
146
+ except TimeoutError:
147
+ raise
148
+ except asyncio.TimeoutError as exc:
149
+ raise TimeoutError("Pipeline timed out") from exc
150
+
151
+ def _run_pipeline(self, consultation_id: str) -> Consultation:
152
+ """Execute all pipeline stages for an existing consultation.
153
+
154
+ Args:
155
+ consultation_id (str): Consultation identifier.
156
+
157
+ Returns:
158
+ Consultation: Updated consultation after pipeline completion.
159
+ """
160
+
161
+ consultation = self.get_consultation(consultation_id)
162
+ if not consultation.audio_file_path:
163
+ raise ModelExecutionError("No uploaded audio available for this consultation")
164
+
165
+ validate_audio(consultation.audio_file_path)
166
+
167
+ total_start = time.perf_counter()
168
+ stage_start = time.perf_counter()
169
+ consultation.status = ConsultationStatus.PROCESSING
170
+ consultation.ended_at = self._timestamp()
171
+
172
+ self.progress[consultation_id] = PipelineProgress(
173
+ consultation_id=consultation_id,
174
+ stage=PipelineStage.TRANSCRIBING,
175
+ progress_pct=33,
176
+ message="Finalising transcript...",
177
+ )
178
+ transcript = self._medasr_model.transcribe(consultation.audio_file_path)
179
+ if not transcript.text.strip():
180
+ raise ModelExecutionError("Audio could not be transcribed.")
181
+ consultation.transcript = transcript.model_copy(update={"consultation_id": consultation_id})
182
+ transcribe_s = round(time.perf_counter() - stage_start, 3)
183
+ logger.info("Pipeline stage complete", consultation_id=consultation_id, stage="transcribe", duration_s=transcribe_s)
184
+ self._clear_cuda_cache()
185
+
186
+ stage_start = time.perf_counter()
187
+ self.progress[consultation_id] = PipelineProgress(
188
+ consultation_id=consultation_id,
189
+ stage=PipelineStage.RETRIEVING_CONTEXT,
190
+ progress_pct=66,
191
+ message="Synthesising patient context...",
192
+ )
193
+ if consultation.context is None:
194
+ try:
195
+ consultation.context = self._ehr_agent.get_patient_context(consultation.patient.id)
196
+ except Exception as exc:
197
+ warning = f"FHIR retrieval unavailable; continuing with transcript only: {exc}"
198
+ logger.warning("FHIR degradation activated", consultation_id=consultation_id, warning=warning)
199
+ consultation.context = self._build_transcript_only_context(consultation, warning)
200
+ context_s = round(time.perf_counter() - stage_start, 3)
201
+ logger.info("Pipeline stage complete", consultation_id=consultation_id, stage="retrieve_context", duration_s=context_s)
202
+ self._clear_cuda_cache()
203
+
204
+ stage_start = time.perf_counter()
205
+ self.progress[consultation_id] = PipelineProgress(
206
+ consultation_id=consultation_id,
207
+ stage=PipelineStage.GENERATING_DOCUMENT,
208
+ progress_pct=90,
209
+ message="Combining transcript and context for document generation...",
210
+ )
211
+ if consultation.transcript and consultation.context:
212
+ consultation.context = self._truncate_context(consultation.context)
213
+ consultation.document = self._generate_document_with_oom_retry(
214
+ consultation.transcript.text,
215
+ consultation.context,
216
+ consultation_id,
217
+ ).model_copy(update={"consultation_id": consultation_id})
218
+ else:
219
+ raise ModelExecutionError("Transcript and patient context are required before document generation")
220
+ generate_s = round(time.perf_counter() - stage_start, 3)
221
+ logger.info("Pipeline stage complete", consultation_id=consultation_id, stage="generate_document", duration_s=generate_s)
222
+ self._clear_cuda_cache()
223
+
224
+ consultation.pipeline_stage = PipelineStage.COMPLETE
225
+ consultation.status = ConsultationStatus.REVIEW
226
+ self.progress[consultation_id] = PipelineProgress(
227
+ consultation_id=consultation_id,
228
+ stage=PipelineStage.COMPLETE,
229
+ progress_pct=100,
230
+ message="Pipeline complete. Clinical document ready for review.",
231
+ )
232
+ total_s = round(time.perf_counter() - total_start, 3)
233
+ logger.info(
234
+ "Pipeline completed",
235
+ consultation_id=consultation_id,
236
+ total_duration_s=total_s,
237
+ transcribe_s=transcribe_s,
238
+ context_s=context_s,
239
+ generate_s=generate_s,
240
+ )
241
+ return consultation
242
+
243
+ def _generate_document_with_oom_retry(
244
+ self,
245
+ transcript_text: str,
246
+ context: PatientContext,
247
+ consultation_id: str,
248
+ ) -> ClinicalDocument:
249
+ """Generate a document with one OOM recovery retry.
250
+
251
+ Args:
252
+ transcript_text (str): Consultation transcript text.
253
+ context (PatientContext): Context payload for document generation.
254
+ consultation_id (str): Consultation identifier for logging.
255
+
256
+ Returns:
257
+ ClinicalDocument: Generated document payload.
258
+ """
259
+
260
+ max_tokens = int(self._doc_generator.settings.DOC_GEN_MAX_TOKENS)
261
+ try:
262
+ return self._doc_generator.generate_document(transcript_text, context, max_new_tokens=max_tokens)
263
+ except TypeError as exc:
264
+ if "max_new_tokens" not in str(exc):
265
+ raise
266
+ logger.warning(
267
+ "Document generator does not accept max_new_tokens override; falling back to default signature",
268
+ consultation_id=consultation_id,
269
+ )
270
+ return self._doc_generator.generate_document(transcript_text, context)
271
+ except torch.cuda.OutOfMemoryError as exc:
272
+ self._clear_cuda_cache()
273
+ reduced_tokens = max(256, max_tokens // 2)
274
+ logger.warning(
275
+ "OOM during document generation; retrying with reduced token budget",
276
+ consultation_id=consultation_id,
277
+ previous_max_new_tokens=max_tokens,
278
+ retry_max_new_tokens=reduced_tokens,
279
+ )
280
+ return self._doc_generator.generate_document(transcript_text, context, max_new_tokens=reduced_tokens)
281
+
282
+ def _build_transcript_only_context(self, consultation: Consultation, warning: str) -> PatientContext:
283
+ """Build minimal patient context when EHR retrieval fails.
284
+
285
+ Args:
286
+ consultation (Consultation): Consultation containing patient demographics.
287
+ warning (str): Retrieval warning text to persist.
288
+
289
+ Returns:
290
+ PatientContext: Transcript-only fallback context.
291
+ """
292
+
293
+ return PatientContext(
294
+ patient_id=consultation.patient.id,
295
+ demographics={
296
+ "name": consultation.patient.name,
297
+ "dob": consultation.patient.date_of_birth,
298
+ "nhs_number": consultation.patient.nhs_number,
299
+ "age": consultation.patient.age,
300
+ "sex": consultation.patient.sex,
301
+ },
302
+ problem_list=[],
303
+ medications=[],
304
+ allergies=[],
305
+ recent_labs=[],
306
+ recent_imaging=[],
307
+ clinical_flags=[],
308
+ last_letter_excerpt=None,
309
+ retrieval_warnings=[warning],
310
+ retrieved_at=self._timestamp(),
311
+ )
312
+
313
+ def _truncate_context(self, context: PatientContext) -> PatientContext:
314
+ """Truncate oversized context payloads to fit the max-sequence envelope.
315
+
316
+ Args:
317
+ context (PatientContext): Structured patient context before generation.
318
+
319
+ Returns:
320
+ PatientContext: Potentially truncated context payload.
321
+ """
322
+
323
+ max_tokens = int(self._doc_generator.settings.MAX_SEQ_LENGTH)
324
+ context_payload = context.model_dump(mode="json")
325
+ estimated_tokens = len(json.dumps(context_payload, ensure_ascii=False).split())
326
+ if estimated_tokens <= max_tokens:
327
+ return context
328
+
329
+ for list_field in ("recent_labs", "medications", "problem_list", "allergies", "recent_imaging", "clinical_flags"):
330
+ values = context_payload.get(list_field, [])
331
+ if isinstance(values, list) and len(values) > 2:
332
+ context_payload[list_field] = values[: max(2, len(values) // 2)]
333
+ estimated_tokens = len(json.dumps(context_payload, ensure_ascii=False).split())
334
+ if estimated_tokens <= max_tokens:
335
+ break
336
+
337
+ warnings = list(context_payload.get("retrieval_warnings", []))
338
+ warnings.append("Context exceeded token budget and was truncated to fit generation limits.")
339
+ context_payload["retrieval_warnings"] = warnings
340
+ return PatientContext.model_validate(context_payload)
341
+
342
+
343
+ def update_document_sections(self, consultation_id: str, sections: list[dict[str, str]]) -> ClinicalDocument:
344
+ """Update editable document sections before sign-off.
345
+
346
+ Args:
347
+ consultation_id (str): Consultation identifier.
348
+ sections (list[dict[str, str]]): Edited sections with heading and content keys.
349
+
350
+ Returns:
351
+ ClinicalDocument: Updated in-memory document.
352
+ """
353
+
354
+ consultation = self.get_consultation(consultation_id)
355
+ if consultation.document is None:
356
+ raise ModelExecutionError("No generated document available to edit")
357
+
358
+ existing_sections = list(consultation.document.sections)
359
+ updates_by_heading = {str(item.get("heading", "")).strip().lower(): str(item.get("content", "")) for item in sections if str(item.get("heading", "")).strip()}
360
+
361
+ for idx, section in enumerate(existing_sections):
362
+ key = section.heading.strip().lower()
363
+ if key in updates_by_heading:
364
+ existing_sections[idx] = section.model_copy(update={"content": updates_by_heading[key]})
365
+
366
+ consultation.document = consultation.document.model_copy(update={"sections": existing_sections})
367
+ return consultation.document
368
+
369
+ def sign_off_document(self, consultation_id: str) -> ClinicalDocument:
370
+ """Mark a generated consultation document as signed off.
371
+
372
+ Args:
373
+ consultation_id (str): Consultation identifier.
374
+
375
+ Returns:
376
+ ClinicalDocument: Signed-off document for the consultation.
377
+ """
378
+
379
+ consultation = self.get_consultation(consultation_id)
380
+ if consultation.document is None:
381
+ raise ModelExecutionError("No generated document available to sign off")
382
+
383
+ consultation.status = ConsultationStatus.SIGNED_OFF
384
+ consultation.document = consultation.document.model_copy(update={"status": ConsultationStatus.SIGNED_OFF})
385
+ logger.info("Document signed off", consultation_id=consultation_id)
386
+ return consultation.document
387
+
388
+ def regenerate_document_section(self, consultation_id: str, section_heading: str) -> ClinicalDocument:
389
+ """Regenerate a single section by re-running generation and replacing matching heading content.
390
+
391
+ Args:
392
+ consultation_id (str): Consultation identifier.
393
+ section_heading (str): Heading name for the section to regenerate.
394
+
395
+ Returns:
396
+ ClinicalDocument: Updated clinical document with refreshed section content.
397
+ """
398
+
399
+ consultation = self.get_consultation(consultation_id)
400
+ if consultation.transcript is None or consultation.context is None:
401
+ raise ModelExecutionError("Transcript and context are required to regenerate a section")
402
+ if consultation.document is None:
403
+ raise ModelExecutionError("No generated document available to regenerate")
404
+
405
+ refreshed_document = self._doc_generator.generate_document(
406
+ consultation.transcript.text,
407
+ consultation.context,
408
+ ).model_copy(update={"consultation_id": consultation_id})
409
+ replacement_section = next(
410
+ (
411
+ section
412
+ for section in refreshed_document.sections
413
+ if section.heading.lower() == section_heading.strip().lower()
414
+ ),
415
+ None,
416
+ )
417
+ if replacement_section is None:
418
+ raise ModelExecutionError(f"Section not found in regenerated output: {section_heading}")
419
+
420
+ updated_sections = []
421
+ replaced = False
422
+ for section in consultation.document.sections:
423
+ if section.heading.lower() == section_heading.strip().lower():
424
+ updated_sections.append(replacement_section)
425
+ replaced = True
426
+ else:
427
+ updated_sections.append(section)
428
+
429
+ if not replaced:
430
+ raise ModelExecutionError(f"Section not found in existing document: {section_heading}")
431
+
432
+ consultation.document = consultation.document.model_copy(
433
+ update={
434
+ "sections": updated_sections,
435
+ "generated_at": refreshed_document.generated_at,
436
+ "generation_time_s": refreshed_document.generation_time_s,
437
+ "status": ConsultationStatus.REVIEW,
438
+ }
439
+ )
440
+ consultation.status = ConsultationStatus.REVIEW
441
+ logger.info("Document section regenerated", consultation_id=consultation_id, section_heading=section_heading)
442
+ return consultation.document
443
+
444
+ def _build_document_prompt_payload(self, consultation_id: str) -> str:
445
+ """Compose stage-3 prompt payload from transcript and context as JSON.
446
+
447
+ Args:
448
+ consultation_id (str): Consultation identifier.
449
+
450
+ Returns:
451
+ str: Combined payload string for downstream document generation model.
452
+ """
453
+
454
+ consultation = self.get_consultation(consultation_id)
455
+ if consultation.transcript is None or consultation.context is None:
456
+ raise ModelExecutionError("Transcript and context are required for prompt assembly")
457
+
458
+ payload = {
459
+ "consultation_id": consultation_id,
460
+ "transcript": consultation.transcript.text,
461
+ "context": consultation.context.model_dump(mode="json"),
462
+ }
463
+ return json.dumps(payload, ensure_ascii=False)
464
+
465
+ def get_consultation(self, consultation_id: str) -> Consultation:
466
+ """Get a consultation by id from in-memory store.
467
+
468
+ Args:
469
+ consultation_id (str): Consultation identifier.
470
+
471
+ Returns:
472
+ Consultation: Stored consultation object.
473
+ """
474
+
475
+ consultation = self.consultations.get(consultation_id)
476
+ if consultation is None:
477
+ raise KeyError(f"Consultation not found: {consultation_id}")
478
+ return consultation
479
+
480
+ def get_progress(self, consultation_id: str) -> PipelineProgress:
481
+ """Get latest pipeline progress for a consultation.
482
+
483
+ Args:
484
+ consultation_id (str): Consultation identifier.
485
+
486
+ Returns:
487
+ PipelineProgress: Last recorded progress event.
488
+ """
489
+
490
+ progress = self.progress.get(consultation_id)
491
+ if progress is None:
492
+ raise KeyError(f"Progress not found for consultation: {consultation_id}")
493
+ return progress
494
+
495
+
496
+ PROMPTS_DIR = Path("backend/prompts")
backend/prompts/context_synthesis.j2 ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <|system|>
2
+ You are a clinical EHR navigation agent. Your task is to retrieve and synthesise a patient's medical context from FHIR resources to support clinical documentation.
3
+
4
+ Given a patient ID, use the available FHIR tools to retrieve:
5
+ 1. Demographics (Patient resource)
6
+ 2. Active conditions/diagnoses (Condition resources)
7
+ 3. Current medications (MedicationRequest resources)
8
+ 4. Allergies (AllergyIntolerance resources)
9
+ 5. Recent laboratory results — last 6 months (Observation resources, category=laboratory)
10
+ 6. Recent imaging reports (DiagnosticReport resources)
11
+
12
+ After retrieval, synthesise the data into the following JSON structure ONLY. Do not include any explanation, commentary, or markdown formatting. Output ONLY valid JSON:
13
+
14
+ {
15
+ "patient_id": "...",
16
+ "demographics": {...},
17
+ "problem_list": ["..."],
18
+ "medications": [{...}],
19
+ "allergies": [{...}],
20
+ "recent_labs": [{...}],
21
+ "recent_imaging": [{...}],
22
+ "clinical_flags": ["..."],
23
+ "last_letter_excerpt": "...",
24
+ "retrieval_warnings": [],
25
+ "retrieved_at": "..."
26
+ }
27
+ <|end|>
28
+
29
+ <|user|>
30
+ PATIENT ID: {{ patient_id }}
31
+
32
+ RAW FHIR DATA:
33
+ {{ raw_fhir_data }}
34
+
35
+ Synthesise the patient context now.
36
+ <|end|>
37
+
38
+ <|assistant|>
backend/prompts/document_generation.j2 ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <|system|>
2
+ You are an NHS clinical documentation assistant. Generate a structured NHS outpatient clinic letter from the consultation transcript and patient context provided below.
3
+
4
+ STRICT OUTPUT FORMAT (follow exactly)
5
+ - Date: {{ letter_date }}
6
+ - Addressee: GP name and practice address from the patient record
7
+ - Re: Full patient name, DOB, NHS number
8
+ - Salutation: Dear Dr [GP surname],
9
+ - Body sections in this exact order:
10
+ 1) History of presenting complaint
11
+ 2) Examination findings
12
+ 3) Investigation results
13
+ 4) Assessment and plan
14
+ 5) Current medications
15
+ - Sign-off line: Warm regards,
16
+ {{ clinician_name }}
17
+ {{ clinician_title }}
18
+
19
+ STYLE AND SAFETY RULES
20
+ 1. Use formal British medical English, third person, past tense.
21
+ 2. Include both positive and negative findings from the consultation.
22
+ 3. Keep length between 300 and 500 words.
23
+ 4. Use ONLY facts from transcript + patient context.
24
+ 5. Use EXACT numeric values from patient context (no rounding, no unit changes, no fabrication).
25
+ 6. If the transcript states a value that differs from EHR context, include the EHR value and mark the mismatch as [DISCREPANCY].
26
+ 7. Do not add bullet points unless the source explicitly lists items.
27
+ 8. If a section has no discussed information, write a short factual sentence stating this.
28
+
29
+ NEGATIVE EXAMPLES (DO NOT DO THESE)
30
+ - Do NOT invent examination findings that were not discussed.
31
+ - Do NOT replace exact values (e.g., "55 mmol/mol") with vague language (e.g., "raised").
32
+ - Do NOT use US spelling (e.g., "anemia", "behavior").
33
+ - Do NOT omit safety-critical negatives that were explicitly denied.
34
+
35
+ MICRO-EXEMPLAR OF EXPECTED REGISTER
36
+ Date: 13 Feb 2026
37
+ Re: Mary Thompson, DOB 12/04/1964, NHS No. 943 476 5919
38
+ Dear Dr Ahmed,
39
+ History of presenting complaint
40
+ Mrs Thompson reported persistent fatigue and polydipsia over the preceding three months.
41
+ Investigation results
42
+ Latest blood tests showed HbA1c 55 mmol/mol and eGFR 52 mL/min/1.73m².
43
+ Assessment and plan
44
+ The impression was suboptimal glycaemic control; metformin was continued and repeat bloods were arranged in three months.
45
+ Warm regards,
46
+ Dr Sarah Chen
47
+ Consultant Diabetologist
48
+ <|end|>
49
+
50
+ <|user|>
51
+ ## CONSULTATION TRANSCRIPT
52
+ {{ transcript }}
53
+
54
+ ## PATIENT CONTEXT (from Electronic Health Record)
55
+ {{ context_json }}
56
+
57
+ Generate the NHS clinic letter now.
58
+ <|end|>
59
+
60
+ <|assistant|>
backend/prompts/ehr_agent_system.txt ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are a clinical EHR navigation agent. Your task is to retrieve and synthesise a patient's medical context from FHIR resources to support clinical documentation.
2
+
3
+ Given a patient ID, use the available FHIR tools to retrieve:
4
+ 1. Demographics (Patient resource)
5
+ 2. Active conditions/diagnoses (Condition resources)
6
+ 3. Current medications (MedicationRequest resources)
7
+ 4. Allergies (AllergyIntolerance resources)
8
+ 5. Recent laboratory results — last 6 months (Observation resources, category=laboratory)
9
+ 6. Recent imaging reports (DiagnosticReport resources)
10
+
11
+ After retrieval, synthesise the data into the following JSON structure ONLY. Do not include any explanation, commentary, or markdown formatting. Output ONLY valid JSON:
12
+
13
+ {
14
+ "patient_id": "...",
15
+ "demographics": {...},
16
+ "problem_list": ["..."],
17
+ "medications": [{...}],
18
+ "allergies": [{...}],
19
+ "recent_labs": [{...}],
20
+ "recent_imaging": [{...}],
21
+ "clinical_flags": ["..."],
22
+ "last_letter_excerpt": "...",
23
+ "retrieval_warnings": [],
24
+ "retrieved_at": "..."
25
+ }
backend/schemas.py ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Clarke data models — Pydantic v2 schemas for all system objects."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from datetime import datetime
6
+ from enum import Enum
7
+ from typing import Optional
8
+
9
+ from pydantic import BaseModel, Field
10
+
11
+
12
+ class ConsultationStatus(str, Enum):
13
+ """Status lifecycle for a consultation session."""
14
+
15
+ IDLE = "idle"
16
+ RECORDING = "recording"
17
+ PAUSED = "paused"
18
+ PROCESSING = "processing"
19
+ REVIEW = "review"
20
+ SIGNED_OFF = "signed_off"
21
+
22
+
23
+ class PipelineStage(str, Enum):
24
+ """Discrete execution stages for the consultation pipeline."""
25
+
26
+ TRANSCRIBING = "transcribing"
27
+ RETRIEVING_CONTEXT = "retrieving_context"
28
+ GENERATING_DOCUMENT = "generating_document"
29
+ COMPLETE = "complete"
30
+ FAILED = "failed"
31
+
32
+
33
+ class Patient(BaseModel):
34
+ """A patient in the clinic list."""
35
+
36
+ id: str = Field(description="FHIR Patient resource ID")
37
+ nhs_number: str = Field(description="NHS number (format: XXX XXX XXXX)")
38
+ name: str = Field(description="Full name (e.g., 'Mrs. Margaret Thompson')")
39
+ date_of_birth: str = Field(description="DOB in DD/MM/YYYY format")
40
+ age: int
41
+ sex: str = Field(description="'Male' or 'Female'")
42
+ appointment_time: str = Field(description="HH:MM format")
43
+ summary: str = Field(description="One-line clinical summary for dashboard card")
44
+
45
+
46
+ class LabResult(BaseModel):
47
+ """A single laboratory result with trend."""
48
+
49
+ name: str = Field(description="e.g., 'HbA1c'")
50
+ value: str = Field(description="e.g., '55'")
51
+ unit: str = Field(description="e.g., 'mmol/mol'")
52
+ reference_range: Optional[str] = Field(default=None, description="e.g., '20-42'")
53
+ date: str = Field(description="ISO date of result")
54
+ trend: Optional[str] = Field(default=None, description="'rising', 'falling', 'stable', or None")
55
+ previous_value: Optional[str] = Field(default=None, description="Previous result value")
56
+ previous_date: Optional[str] = Field(default=None)
57
+ fhir_resource_id: Optional[str] = Field(default=None, description="Source FHIR Observation ID")
58
+
59
+
60
+ class PatientContext(BaseModel):
61
+ """Structured patient context synthesised by the EHR Agent from FHIR data."""
62
+
63
+ patient_id: str
64
+ demographics: dict = Field(description="name, dob, nhs_number, age, sex, address")
65
+ problem_list: list[str] = Field(description="Active diagnoses, e.g., ['Type 2 Diabetes Mellitus (2019)', ...]")
66
+ medications: list[dict] = Field(
67
+ description="[{'name': 'Metformin', 'dose': '1g', 'frequency': 'BD', 'fhir_id': '...'}]"
68
+ )
69
+ allergies: list[dict] = Field(
70
+ description="[{'substance': 'Penicillin', 'reaction': 'Anaphylaxis', 'severity': 'high'}]"
71
+ )
72
+ recent_labs: list[LabResult] = Field(default_factory=list)
73
+ recent_imaging: list[dict] = Field(default_factory=list, description="[{'type': 'CXR', 'date': '...', 'summary': '...'}]")
74
+ clinical_flags: list[str] = Field(default_factory=list, description="['HbA1c rising trend over 6 months']")
75
+ last_letter_excerpt: Optional[str] = Field(default=None, description="Key excerpt from most recent clinic letter")
76
+ retrieval_warnings: list[str] = Field(default_factory=list, description="Warnings if some FHIR queries failed")
77
+ retrieved_at: str = Field(description="ISO timestamp of retrieval")
78
+
79
+
80
+ class Transcript(BaseModel):
81
+ """Consultation transcript produced by MedASR."""
82
+
83
+ consultation_id: str
84
+ text: str = Field(description="Full transcript text")
85
+ duration_s: float = Field(description="Audio duration in seconds")
86
+ word_count: int
87
+ created_at: str
88
+
89
+
90
+ class DocumentSection(BaseModel):
91
+ """A single section of the generated clinical letter."""
92
+
93
+ heading: str = Field(description="e.g., 'History of presenting complaint'")
94
+ content: str = Field(description="Section body text")
95
+ editable: bool = Field(default=True)
96
+ fhir_sources: list[str] = Field(default_factory=list, description="FHIR resource IDs cited in this section")
97
+
98
+
99
+ class ClinicalDocument(BaseModel):
100
+ """A generated NHS clinical letter."""
101
+
102
+ consultation_id: str
103
+ letter_date: str
104
+ patient_name: str
105
+ patient_dob: str
106
+ nhs_number: str
107
+ addressee: str = Field(description="GP name and address")
108
+ salutation: str = Field(description="e.g., 'Dear Dr. Patel,'")
109
+ sections: list[DocumentSection]
110
+ medications_list: list[str] = Field(description="Current medications (formatted)")
111
+ sign_off: str = Field(description="e.g., 'Dr. S. Chen, Consultant Diabetologist'")
112
+ status: ConsultationStatus = ConsultationStatus.REVIEW
113
+ generated_at: str
114
+ generation_time_s: float = Field(description="Time taken for MedGemma 27B inference")
115
+ discrepancies: list[dict] = Field(default_factory=list, description="[{'type': 'allergy_mismatch', 'detail': '...'}]")
116
+
117
+
118
+ class Consultation(BaseModel):
119
+ """A complete consultation session — links patient, transcript, context, and document."""
120
+
121
+ id: str = Field(description="Unique consultation ID (UUID)")
122
+ patient: Patient
123
+ status: ConsultationStatus = ConsultationStatus.IDLE
124
+ pipeline_stage: Optional[PipelineStage] = None
125
+ context: Optional[PatientContext] = None
126
+ transcript: Optional[Transcript] = None
127
+ document: Optional[ClinicalDocument] = None
128
+ started_at: Optional[str] = None
129
+ ended_at: Optional[str] = None
130
+ audio_file_path: Optional[str] = None
131
+
132
+
133
+ class PipelineProgress(BaseModel):
134
+ """Real-time pipeline progress updates pushed to the UI."""
135
+
136
+ consultation_id: str
137
+ stage: PipelineStage
138
+ progress_pct: int = Field(ge=0, le=100)
139
+ message: str = Field(description="Human-readable status, e.g., 'Finalising transcript...'")
140
+
141
+
142
+ class ErrorResponse(BaseModel):
143
+ """Standardised error response format."""
144
+
145
+ error: str = Field(description="Error category: 'model_error', 'fhir_error', 'audio_error', 'timeout'")
146
+ message: str = Field(description="Human-readable error message for UI display")
147
+ detail: Optional[str] = Field(default=None, description="Technical detail (logged, not shown to user)")
148
+ consultation_id: Optional[str] = None
149
+ timestamp: str
backend/utils.py ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Shared utility helpers used across backend modules."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import json
6
+ import time
7
+ from collections.abc import Callable
8
+ from functools import wraps
9
+ from typing import Any
10
+
11
+ from backend.errors import get_component_logger
12
+
13
+
14
+ def timed(component: str) -> Callable:
15
+ """Measure function runtime and log duration with structured metadata.
16
+
17
+ Params:
18
+ component (str): Component name used in structured logs.
19
+ Returns:
20
+ Callable: Decorator wrapping the target callable.
21
+ """
22
+
23
+ log = get_component_logger(component)
24
+
25
+ def decorator(func: Callable) -> Callable:
26
+ """Wrap a function with timing instrumentation.
27
+
28
+ Params:
29
+ func (Callable): Function to time.
30
+ Returns:
31
+ Callable: Timed wrapper preserving function signature metadata.
32
+ """
33
+
34
+ @wraps(func)
35
+ def wrapper(*args: Any, **kwargs: Any) -> Any:
36
+ """Execute wrapped function and emit success/failure timing logs.
37
+
38
+ Params:
39
+ *args (Any): Positional arguments passed to wrapped function.
40
+ **kwargs (Any): Keyword arguments passed to wrapped function.
41
+ Returns:
42
+ Any: Original function return value.
43
+ """
44
+
45
+ start_time = time.perf_counter()
46
+ try:
47
+ result = func(*args, **kwargs)
48
+ duration_s = time.perf_counter() - start_time
49
+ log.info(
50
+ "Function execution complete",
51
+ function_name=func.__name__,
52
+ duration_s=round(duration_s, 4),
53
+ )
54
+ return result
55
+ except Exception:
56
+ duration_s = time.perf_counter() - start_time
57
+ log.exception(
58
+ "Function execution failed",
59
+ function_name=func.__name__,
60
+ duration_s=round(duration_s, 4),
61
+ )
62
+ raise
63
+
64
+ return wrapper
65
+
66
+ return decorator
67
+
68
+
69
+ def sanitize_json_payload(payload: Any) -> Any:
70
+ """Return a JSON-serialisable deep copy of an arbitrary payload.
71
+
72
+ Params:
73
+ payload (Any): Input object that should be JSON serialisable.
74
+ Returns:
75
+ Any: Normalised payload safe for JSON transport/storage.
76
+ """
77
+
78
+ return json.loads(json.dumps(payload, default=str))
clarke/.env.template ADDED
@@ -0,0 +1 @@
 
 
1
+ # Placeholder
clarke/Dockerfile ADDED
@@ -0,0 +1 @@
 
 
1
+ # Placeholder
clarke/LICENSE ADDED
@@ -0,0 +1 @@
 
 
1
+ # Placeholder
clarke/README.md ADDED
@@ -0,0 +1 @@
 
 
1
+ # Placeholder
clarke/app.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Legacy placeholder module for the nested `clarke` package app entrypoint."""
clarke/backend/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Package initializer."""
clarke/backend/api.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Placeholder module."""
clarke/backend/audio.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Placeholder module."""
clarke/backend/config.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Placeholder module."""
clarke/backend/errors.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Placeholder module."""
clarke/backend/fhir/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Package initializer."""
clarke/backend/fhir/client.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Placeholder module."""
clarke/backend/fhir/mock_api.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Placeholder module."""
clarke/backend/fhir/queries.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Placeholder module."""
clarke/backend/fhir/tools.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Placeholder module."""
clarke/backend/models/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Package initializer."""
clarke/backend/models/doc_generator.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Placeholder module."""
clarke/backend/models/ehr_agent.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Placeholder module."""