anhnv125 commited on
Commit
4cf1dec
·
verified ·
1 Parent(s): c50478a

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +51 -6
  2. app.py +462 -0
  3. requirements.txt +3 -0
README.md CHANGED
@@ -1,10 +1,55 @@
1
  ---
2
- title: Data Viewer
3
- emoji: 🐢
4
- colorFrom: pink
5
- colorTo: gray
6
- sdk: static
 
 
7
  pinned: false
 
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Medical Conversation Viewer
3
+ emoji: "🩺"
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
  pinned: false
10
+ license: mit
11
+ datasets:
12
+ - Mediform/seed_data_v5
13
  ---
14
 
15
+ # Medical Conversation Dataset Viewer
16
+
17
+ Interactive viewer for the **Mediform/seed_data_v5** dataset containing synthetic German doctor-patient conversations for medical ASR training.
18
+
19
+ ## Features
20
+
21
+ - **Conversation Selection**: Browse through different medical scenarios
22
+ - **Variant Support**: View conversations in three formats:
23
+ - `natural`: Natural dialogue flow
24
+ - `inline_dictation`: Dialogue with inline doctor dictation
25
+ - `post_dictation`: Dialogue with post-turn dictation
26
+ - **Step-by-Step Navigation**: Walk through conversations turn by turn
27
+ - **EHR Reference Tracking**: Watch Electronic Health Record categories populate as the conversation progresses:
28
+ - **History (Anamnese)**: Patient history and symptoms
29
+ - **Findings (Befunde)**: Examination findings and test results
30
+ - **Treatment (Therapie)**: Treatment decisions and medications
31
+ - **Plan (Prozedere)**: Follow-up plans and diagnostics
32
+ - **Orders (Anordnungen)**: Lab orders, appointments, prescriptions
33
+
34
+ ## How It Works
35
+
36
+ The conversations contain `<ref>` tags that link spoken content to structured EHR entries. As you navigate forward through the conversation, referenced items are added to their respective categories. Navigating backward removes them, showing how the medical record builds up during the consultation.
37
+
38
+ ## Dataset
39
+
40
+ This viewer displays data from [Mediform/seed_data_v5](https://huggingface.co/datasets/Mediform/seed_data_v5), which contains:
41
+ - Synthetic German medical dialogues
42
+ - Multiple conversation variants
43
+ - Structured EHR annotations
44
+ - Medical terminology (boost terms)
45
+
46
+ ## Local Development
47
+
48
+ ```bash
49
+ pip install -r requirements.txt
50
+ python app.py
51
+ ```
52
+
53
+ ## License
54
+
55
+ MIT
app.py ADDED
@@ -0,0 +1,462 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Hugging Face Space for viewing Mediform/seed_data_v5 dataset.
3
+ Displays doctor-patient conversations with EHR reference tracking.
4
+ """
5
+
6
+ import gradio as gr
7
+ import re
8
+ import json
9
+ from datasets import load_dataset
10
+
11
+
12
+ def parse_json_fields(item: dict) -> dict:
13
+ """Parse JSON string fields in dataset item."""
14
+ result = dict(item)
15
+
16
+ # Fields that may be stored as JSON strings in HF dataset
17
+ json_fields = ["conversations", "ehr_dict", "orders"]
18
+
19
+ for field in json_fields:
20
+ if field in result and isinstance(result[field], str):
21
+ try:
22
+ result[field] = json.loads(result[field])
23
+ except json.JSONDecodeError:
24
+ pass
25
+
26
+ return result
27
+
28
+
29
+ def load_data():
30
+ """Load dataset from Hugging Face Hub or local fallback."""
31
+ try:
32
+ ds = load_dataset("Mediform/seed_data_v5", split="train")
33
+ # Convert to list of dicts and parse JSON string fields
34
+ data = [parse_json_fields(dict(row)) for row in ds]
35
+ return data
36
+ except Exception as e:
37
+ print(f"Failed to load from HF Hub: {e}")
38
+ # Fallback to local file if available
39
+ try:
40
+ with open("term_groups_ehr_dataset_v3.json", "r", encoding="utf-8") as f:
41
+ local_data = json.load(f)
42
+ return local_data.get("data", [])
43
+ except:
44
+ return []
45
+
46
+
47
+ # Load data at startup
48
+ DATA = load_data()
49
+
50
+ # Category mapping for display
51
+ CATEGORY_LABELS = {
52
+ "history": "History (Anamnese)",
53
+ "findings": "Findings (Befunde)",
54
+ "treatment": "Treatment (Therapie)",
55
+ "plan": "Plan (Prozedere)",
56
+ "order": "Orders (Anordnungen)"
57
+ }
58
+
59
+ VARIANTS = ["natural", "inline_dictation", "post_dictation"]
60
+
61
+
62
+ def get_conversation_options():
63
+ """Get list of conversation options for dropdown."""
64
+ options = []
65
+ for i, item in enumerate(DATA):
66
+ scenario = item.get("brief_scenario", f"Conversation {i+1}")
67
+ # Truncate long scenarios
68
+ if len(scenario) > 80:
69
+ scenario = scenario[:77] + "..."
70
+ options.append(f"{i+1}. {scenario}")
71
+ return options
72
+
73
+
74
+ def extract_refs_from_turn(content: str) -> dict:
75
+ """
76
+ Extract <ref keys="...">...</ref> tags from turn content.
77
+ Returns dict mapping category to list of (key, text) tuples.
78
+ """
79
+ refs = {"history": [], "findings": [], "treatment": [], "plan": [], "order": []}
80
+
81
+ # Pattern to match <ref keys="key1,key2">text</ref>
82
+ pattern = r'<ref\s+keys="([^"]+)">([^<]+)</ref>'
83
+
84
+ for match in re.finditer(pattern, content):
85
+ keys_str = match.group(1)
86
+ text = match.group(2)
87
+
88
+ for key in keys_str.split(","):
89
+ key = key.strip()
90
+ # Determine category from key prefix
91
+ if key.startswith("history_"):
92
+ refs["history"].append((key, text))
93
+ elif key.startswith("findings_"):
94
+ refs["findings"].append((key, text))
95
+ elif key.startswith("treatment_"):
96
+ refs["treatment"].append((key, text))
97
+ elif key.startswith("plan_"):
98
+ refs["plan"].append((key, text))
99
+ elif key.startswith("order_"):
100
+ refs["order"].append((key, text))
101
+
102
+ return refs
103
+
104
+
105
+ def clean_turn_content(content: str) -> str:
106
+ """Remove <ref> tags but keep the text content."""
107
+ return re.sub(r'<ref\s+keys="[^"]+">([^<]+)</ref>', r'\1', content)
108
+
109
+
110
+ def format_role(role: str) -> str:
111
+ """Format role for display."""
112
+ role_map = {
113
+ "patient": "Patient",
114
+ "doctor": "Arzt",
115
+ "doctor_dictation": "Arzt (Diktat)"
116
+ }
117
+ return role_map.get(role, role)
118
+
119
+
120
+ def get_role_color(role: str) -> str:
121
+ """Get background color for role."""
122
+ if role == "patient":
123
+ return "#e3f2fd" # Light blue
124
+ elif role == "doctor":
125
+ return "#e8f5e9" # Light green
126
+ else:
127
+ return "#fff3e0" # Light orange for dictation
128
+
129
+
130
+ def render_conversation(conv_idx: int, variant: str, turn_idx: int):
131
+ """
132
+ Render conversation up to turn_idx and collect EHR references.
133
+ Returns (conversation_html, history, findings, treatment, plan, orders, max_turns, current_turn)
134
+ """
135
+ if not DATA or conv_idx < 0 or conv_idx >= len(DATA):
136
+ return "<p>No data available</p>", "", "", "", "", "", 0, 0
137
+
138
+ item = DATA[conv_idx]
139
+ conversations = item.get("conversations", {})
140
+
141
+ if variant not in conversations:
142
+ return f"<p>Variant '{variant}' not available</p>", "", "", "", "", "", 0, 0
143
+
144
+ turns = conversations[variant].get("turns", [])
145
+ max_turns = len(turns)
146
+
147
+ if max_turns == 0:
148
+ return "<p>No turns in this conversation</p>", "", "", "", "", "", 0, 0
149
+
150
+ # Clamp turn_idx
151
+ turn_idx = max(0, min(turn_idx, max_turns - 1))
152
+
153
+ # Get EHR data for reference lookup
154
+ ehr_dict = item.get("ehr_dict", {})
155
+
156
+ # Collect all refs up to current turn
157
+ all_refs = {"history": {}, "findings": {}, "treatment": {}, "plan": {}, "order": {}}
158
+
159
+ # Build conversation HTML
160
+ conv_html = '<div style="max-height: 500px; overflow-y: auto; padding: 10px;">'
161
+
162
+ for i in range(turn_idx + 1):
163
+ turn = turns[i]
164
+ role = turn.get("role", "unknown")
165
+ content = turn.get("content", "")
166
+
167
+ # Extract refs from this turn
168
+ turn_refs = extract_refs_from_turn(content)
169
+
170
+ # Add refs to collected refs (using key as identifier to avoid duplicates)
171
+ for category, ref_list in turn_refs.items():
172
+ for key, text in ref_list:
173
+ if key not in all_refs[category]:
174
+ # Look up full text from ehr_dict
175
+ full_text = ehr_dict.get(key, text)
176
+ all_refs[category][key] = full_text
177
+
178
+ # Clean content for display
179
+ clean_content = clean_turn_content(content)
180
+ role_display = format_role(role)
181
+ bg_color = get_role_color(role)
182
+
183
+ conv_html += f'''
184
+ <div style="margin-bottom: 12px; padding: 10px; border-radius: 8px; background-color: {bg_color};">
185
+ <strong style="color: #333;">{role_display}:</strong>
186
+ <p style="margin: 5px 0 0 0; color: #444;">{clean_content}</p>
187
+ </div>
188
+ '''
189
+
190
+ conv_html += '</div>'
191
+
192
+ # Format bucket contents
193
+ def format_bucket(refs_dict: dict) -> str:
194
+ if not refs_dict:
195
+ return "<em style='color: #999;'>Keine Einträge</em>"
196
+
197
+ items = []
198
+ for key, text in sorted(refs_dict.items()):
199
+ # Handle orders which might be JSON
200
+ if key.startswith("order_") and text.startswith("{"):
201
+ try:
202
+ order_data = json.loads(text)
203
+ text = order_data.get("details", text)
204
+ except:
205
+ pass
206
+ items.append(f"<li style='margin-bottom: 8px;'>{text}</li>")
207
+
208
+ return f"<ul style='margin: 0; padding-left: 20px;'>{''.join(items)}</ul>"
209
+
210
+ history_html = format_bucket(all_refs["history"])
211
+ findings_html = format_bucket(all_refs["findings"])
212
+ treatment_html = format_bucket(all_refs["treatment"])
213
+ plan_html = format_bucket(all_refs["plan"])
214
+ orders_html = format_bucket(all_refs["order"])
215
+
216
+ return conv_html, history_html, findings_html, treatment_html, plan_html, orders_html, max_turns, turn_idx
217
+
218
+
219
+ def on_conversation_change(conv_selection: str, variant: str):
220
+ """Handle conversation dropdown change."""
221
+ if not conv_selection:
222
+ return "<p>Select a conversation</p>", "", "", "", "", "", 0, 0
223
+
224
+ # Extract index from selection (format: "1. scenario...")
225
+ try:
226
+ conv_idx = int(conv_selection.split(".")[0]) - 1
227
+ except:
228
+ conv_idx = 0
229
+
230
+ # Start at first turn
231
+ return render_conversation(conv_idx, variant, 0)
232
+
233
+
234
+ def on_variant_change(conv_selection: str, variant: str, current_turn: int):
235
+ """Handle variant dropdown change."""
236
+ if not conv_selection:
237
+ return "<p>Select a conversation</p>", "", "", "", "", "", 0, 0
238
+
239
+ try:
240
+ conv_idx = int(conv_selection.split(".")[0]) - 1
241
+ except:
242
+ conv_idx = 0
243
+
244
+ # Reset to first turn when variant changes
245
+ return render_conversation(conv_idx, variant, 0)
246
+
247
+
248
+ def on_next(conv_selection: str, variant: str, current_turn: int, max_turns: int):
249
+ """Go to next turn."""
250
+ if not conv_selection:
251
+ return "<p>Select a conversation</p>", "", "", "", "", "", 0, 0
252
+
253
+ try:
254
+ conv_idx = int(conv_selection.split(".")[0]) - 1
255
+ except:
256
+ conv_idx = 0
257
+
258
+ new_turn = min(current_turn + 1, max_turns - 1)
259
+ return render_conversation(conv_idx, variant, new_turn)
260
+
261
+
262
+ def on_back(conv_selection: str, variant: str, current_turn: int, max_turns: int):
263
+ """Go to previous turn."""
264
+ if not conv_selection:
265
+ return "<p>Select a conversation</p>", "", "", "", "", "", 0, 0
266
+
267
+ try:
268
+ conv_idx = int(conv_selection.split(".")[0]) - 1
269
+ except:
270
+ conv_idx = 0
271
+
272
+ new_turn = max(current_turn - 1, 0)
273
+ return render_conversation(conv_idx, variant, new_turn)
274
+
275
+
276
+ def on_reset(conv_selection: str, variant: str):
277
+ """Reset to first turn."""
278
+ if not conv_selection:
279
+ return "<p>Select a conversation</p>", "", "", "", "", "", 0, 0
280
+
281
+ try:
282
+ conv_idx = int(conv_selection.split(".")[0]) - 1
283
+ except:
284
+ conv_idx = 0
285
+
286
+ return render_conversation(conv_idx, variant, 0)
287
+
288
+
289
+ def on_end(conv_selection: str, variant: str, max_turns: int):
290
+ """Go to last turn."""
291
+ if not conv_selection:
292
+ return "<p>Select a conversation</p>", "", "", "", "", "", 0, 0
293
+
294
+ try:
295
+ conv_idx = int(conv_selection.split(".")[0]) - 1
296
+ except:
297
+ conv_idx = 0
298
+
299
+ return render_conversation(conv_idx, variant, max_turns - 1)
300
+
301
+
302
+ # Build Gradio interface
303
+ with gr.Blocks(title="Medical Conversation Viewer") as demo:
304
+ gr.Markdown("""
305
+ # Medical Conversation Dataset Viewer
306
+
307
+ View synthetic German doctor-patient conversations with EHR (Electronic Health Record) reference tracking.
308
+
309
+ **Instructions:**
310
+ 1. Select a conversation from the dropdown
311
+ 2. Choose a conversation variant (natural, inline_dictation, post_dictation)
312
+ 3. Use the navigation buttons to step through the conversation
313
+ 4. Watch the EHR buckets populate as references appear in the dialogue
314
+ """)
315
+
316
+ # State variables
317
+ max_turns_state = gr.State(0)
318
+ current_turn_state = gr.State(0)
319
+
320
+ # Top controls
321
+ with gr.Row():
322
+ conv_dropdown = gr.Dropdown(
323
+ choices=get_conversation_options(),
324
+ label="Select Conversation",
325
+ value=get_conversation_options()[0] if get_conversation_options() else None,
326
+ scale=3
327
+ )
328
+ variant_dropdown = gr.Dropdown(
329
+ choices=VARIANTS,
330
+ label="Variant",
331
+ value="natural",
332
+ scale=1
333
+ )
334
+
335
+ # Navigation controls
336
+ with gr.Row():
337
+ reset_btn = gr.Button("⏮ Start", size="sm")
338
+ back_btn = gr.Button("◀ Back", size="sm")
339
+ turn_display = gr.Markdown("Turn: 1 / 1")
340
+ next_btn = gr.Button("Next ▶", size="sm")
341
+ end_btn = gr.Button("End ⏭", size="sm")
342
+
343
+ # Main content area
344
+ with gr.Row():
345
+ # Left: Conversation
346
+ with gr.Column(scale=1):
347
+ gr.Markdown("### Conversation")
348
+ conversation_html = gr.HTML("<p>Select a conversation to begin</p>")
349
+
350
+ # Right: EHR Buckets
351
+ with gr.Column(scale=1):
352
+ gr.Markdown("### EHR Summary")
353
+
354
+ with gr.Accordion("History (Anamnese)", open=True):
355
+ history_html = gr.HTML("<em style='color: #999;'>Keine Einträge</em>")
356
+
357
+ with gr.Accordion("Findings (Befunde)", open=True):
358
+ findings_html = gr.HTML("<em style='color: #999;'>Keine Einträge</em>")
359
+
360
+ with gr.Accordion("Treatment (Therapie)", open=True):
361
+ treatment_html = gr.HTML("<em style='color: #999;'>Keine Einträge</em>")
362
+
363
+ with gr.Accordion("Plan (Prozedere)", open=True):
364
+ plan_html = gr.HTML("<em style='color: #999;'>Keine Einträge</em>")
365
+
366
+ with gr.Accordion("Orders (Anordnungen)", open=True):
367
+ orders_html = gr.HTML("<em style='color: #999;'>Keine Einträge</em>")
368
+
369
+ # Output components list for convenience
370
+ outputs = [
371
+ conversation_html,
372
+ history_html,
373
+ findings_html,
374
+ treatment_html,
375
+ plan_html,
376
+ orders_html,
377
+ max_turns_state,
378
+ current_turn_state
379
+ ]
380
+
381
+ # Update turn display
382
+ def update_turn_display(current_turn, max_turns):
383
+ return f"**Turn: {current_turn + 1} / {max_turns}**"
384
+
385
+ # Event handlers
386
+ def handle_conversation_change(conv, var):
387
+ result = on_conversation_change(conv, var)
388
+ turn_text = update_turn_display(result[7], result[6])
389
+ return result + (turn_text,)
390
+
391
+ def handle_variant_change(conv, var, curr):
392
+ result = on_variant_change(conv, var, curr)
393
+ turn_text = update_turn_display(result[7], result[6])
394
+ return result + (turn_text,)
395
+
396
+ def handle_next(conv, var, curr, max_t):
397
+ result = on_next(conv, var, curr, max_t)
398
+ turn_text = update_turn_display(result[7], result[6])
399
+ return result + (turn_text,)
400
+
401
+ def handle_back(conv, var, curr, max_t):
402
+ result = on_back(conv, var, curr, max_t)
403
+ turn_text = update_turn_display(result[7], result[6])
404
+ return result + (turn_text,)
405
+
406
+ def handle_reset(conv, var):
407
+ result = on_reset(conv, var)
408
+ turn_text = update_turn_display(result[7], result[6])
409
+ return result + (turn_text,)
410
+
411
+ def handle_end(conv, var, max_t):
412
+ result = on_end(conv, var, max_t)
413
+ turn_text = update_turn_display(result[7], result[6])
414
+ return result + (turn_text,)
415
+
416
+ # Wire up events
417
+ conv_dropdown.change(
418
+ fn=handle_conversation_change,
419
+ inputs=[conv_dropdown, variant_dropdown],
420
+ outputs=outputs + [turn_display]
421
+ )
422
+
423
+ variant_dropdown.change(
424
+ fn=handle_variant_change,
425
+ inputs=[conv_dropdown, variant_dropdown, current_turn_state],
426
+ outputs=outputs + [turn_display]
427
+ )
428
+
429
+ next_btn.click(
430
+ fn=handle_next,
431
+ inputs=[conv_dropdown, variant_dropdown, current_turn_state, max_turns_state],
432
+ outputs=outputs + [turn_display]
433
+ )
434
+
435
+ back_btn.click(
436
+ fn=handle_back,
437
+ inputs=[conv_dropdown, variant_dropdown, current_turn_state, max_turns_state],
438
+ outputs=outputs + [turn_display]
439
+ )
440
+
441
+ reset_btn.click(
442
+ fn=handle_reset,
443
+ inputs=[conv_dropdown, variant_dropdown],
444
+ outputs=outputs + [turn_display]
445
+ )
446
+
447
+ end_btn.click(
448
+ fn=handle_end,
449
+ inputs=[conv_dropdown, variant_dropdown, max_turns_state],
450
+ outputs=outputs + [turn_display]
451
+ )
452
+
453
+ # Load initial conversation
454
+ demo.load(
455
+ fn=handle_conversation_change,
456
+ inputs=[conv_dropdown, variant_dropdown],
457
+ outputs=outputs + [turn_display]
458
+ )
459
+
460
+
461
+ if __name__ == "__main__":
462
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ gradio
2
+ datasets
3
+ huggingface_hub