File size: 24,335 Bytes
c7a6fe6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e1eea264",
   "metadata": {},
   "outputs": [],
   "source": [
    "def training_prompt(medical_text, subclaims):\n",
    "    system_prompt = f\"\"\"\n",
    "You are an expert medical annotator. Your task is to extract granular, factual subclaims from medical text.\n",
    "A subclaim is the smallest standalone factual unit that can be independently verified.\n",
    "\n",
    "Instructions:\n",
    "1. Read the provided medical text.\n",
    "2. Break it into clear, objective subclaims.\n",
    "3. Each subclaim must be directly derived from the text.\n",
    "4. Do not add, guess, infer, or combine multiple facts.\n",
    "5. Each subclaim should be short, specific, and verifiable.\n",
    "\n",
    "Medical Text:\n",
    "{medical_text}\n",
    "\"\"\"\n",
    "\n",
    "    conversation = {}\n",
    "    conversation['conversations'] = (\n",
    "        {'from': \"user\", 'content': system_prompt},\n",
    "        {'from': \"assistant\", 'content': str(subclaims)},\n",
    "    )\n",
    "    return conversation\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72fbae33",
   "metadata": {},
   "outputs": [],
   "source": [
    "# /home/mshahidul/readctrl/data/finetuning_data/finetune_dataset_extract-subclaim.json read\n",
    "with open('/home/mshahidul/readctrl/data/finetuning_data/finetune_dataset_extract-subclaim.json', 'r') as f:\n",
    "    import json\n",
    "    data = json.load(f)\n",
    "prompts = []\n",
    "for item in data:\n",
    "    medical_text = item['medical_text']\n",
    "    subclaims = item['subclaims']\n",
    "    prompt = training_prompt(medical_text, subclaims)\n",
    "    prompts.append(prompt)\n",
    "with open('/home/mshahidul/readctrl/data/finetuning_data/finetune_dataset_extract-subclaim_conversation.json', 'w') as f:\n",
    "    json.dump(prompts, f, indent=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "118e5fce",
   "metadata": {},
   "outputs": [],
   "source": [
    "# python /home/mshahidul/readctrl/code/finetune-inference/completeness_reasoning_v3.py --data_path /home/mshahidul/readctrl/data/concise_complete_attr_cal_v3/evaluated_metrics_0_100.json \n",
    "import os\n",
    "for x in os.listdir('/home/mshahidul/readctrl/data/concise_complete_attr_cal_v3/'):\n",
    "    if x.endswith('.json'):\n",
    "        dat=f'python /home/mshahidul/readctrl/code/finetune-inference/completeness_reasoning_v3.py --data_path /home/mshahidul/readctrl/data/concise_complete_attr_cal_v3/{x}'\n",
    "        print(dat) \n",
    "        print('\\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb108a11",
   "metadata": {},
   "outputs": [],
   "source": [
    "import zipfile\n",
    "\n",
    "# /home/mshahidul/readctrl/data/testing_data/multiclinsum_test_es.zip\n",
    "with zipfile.ZipFile('/home/mshahidul/readctrl/data/testing_data/multiclinsum_test_es.zip', 'r') as zip_ref:\n",
    "    zip_ref.extractall('/home/mshahidul/readctrl/data/testing_data/es_data/')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6ea249db",
   "metadata": {},
   "outputs": [],
   "source": [
    "def training_prompt(text, subclaim, label):\n",
    "    system_prompt = f\"\"\"\n",
    "You are a medical evidence evaluator.\n",
    "\n",
    "Your task is to determine the relationship between a medical text and a subclaim.\n",
    "\n",
    "Definitions:\n",
    "- 1 = supported (the text directly supports the subclaim)\n",
    "- 0 = refuted (the text contradicts the subclaim)\n",
    "- 2 = not_supported (the text is related but provides no evidence for the subclaim)\n",
    "\n",
    "Medical Text:\n",
    "{text}\n",
    "\n",
    "Subclaim:\n",
    "{subclaim}\n",
    "\n",
    "Respond ONLY with a single number: 1, 0, or 2.\n",
    "\"\"\"\n",
    "\n",
    "    conversation = {}\n",
    "    conversation['conversations'] = (\n",
    "        {'from': \"user\", 'content': system_prompt},\n",
    "        {'from': \"assistant\", 'content': str(label)},\n",
    "    )\n",
    "    return conversation\n",
    "# /home/mshahidul/readctrl/data/finetuning_data/processed_subclaim_support_data.json\n",
    "with open('/home/mshahidul/readctrl/data/finetuning_data/processed_subclaim_support_data.json', 'r') as f:\n",
    "    import json\n",
    "    data = json.load(f)\n",
    "prompts = []\n",
    "for item in data:\n",
    "    text = item['text']\n",
    "    subclaim = item['subclaim']\n",
    "    label = item['label']\n",
    "    prompt = training_prompt(text, subclaim, label)\n",
    "    prompts.append(prompt)\n",
    "with open('/home/mshahidul/readctrl/data/finetuning_data/processed_subclaim_support_data_conversation.json', 'w') as f:\n",
    "    json.dump(prompts, f, indent=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fcc9cec9",
   "metadata": {},
   "source": [
    "## classifier design for readability test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "6a5690f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "def readability_training_prompt_with_human(full_text, generated_text, human_score):\n",
    "    \"\"\"\n",
    "    Modified training prompt: Evaluates readability by comparing \n",
    "    generated text against the original source (Full Text) only.\n",
    "    \"\"\"\n",
    "    \n",
    "    system_prompt = f\"\"\"You are a medical readability evaluator.\n",
    "\n",
    "### Task\n",
    "Compare the \"GENERATED TEXT\" against the \"FULL TEXT\" to determine its readability for a general, non-medical audience.\n",
    "\n",
    "### Input Data\n",
    "- **FULL TEXT:** {full_text}\n",
    "- **GENERATED TEXT (Evaluate this):** {generated_text}\n",
    "\n",
    "### Readability Scale\n",
    "1: Very Easy - Minimal medical language, uses simple terms.\n",
    "2: Easy - Accessible to most, minor jargon explained.\n",
    "3: Medium - Some technical terms, moderate complexity.\n",
    "4: Hard - Clinical tone, assumes some prior knowledge.\n",
    "5: Very Hard - Extremely technical, requires medical expertise.\n",
    "\n",
    "### Constraints\n",
    "- Evaluate ONLY the \"GENERATED TEXT\".\n",
    "- Use \"FULL TEXT\" only for context of the subject matter.\n",
    "- Do NOT assess factual accuracy.\n",
    "\n",
    "### Output Format\n",
    "Return ONLY the following JSON object:\n",
    "{{\n",
    "  \"readability_score\": {human_score}\n",
    "}}\"\"\"\n",
    "\n",
    "    # Structured for standard SFT (Supervised Fine-Tuning) formats\n",
    "    conversation = {\n",
    "        \"conversations\": [\n",
    "            {\"role\": \"user\", \"content\": system_prompt},\n",
    "            {\"role\": \"assistant\", \"content\": f\"{{\\\"readability_score\\\": {human_score}}}\"}\n",
    "        ]\n",
    "    }\n",
    "    \n",
    "    return conversation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "63b469ef",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dict_keys(['low_health_literacy', 'intermediate_health_literacy', 'proficient_health_literacy'])\n"
     ]
    }
   ],
   "source": [
    "# /home/mshahidul/readctrl/data/annotators_validate_data/Sharmin Sultana_2025-12-31_14-19-30/annotation_results.json\n",
    "with open('/home/mshahidul/readctrl/data/synthetic_dataset_diff_labels/syn_data_diff_labels_en_v1.json', 'r') as f:\n",
    "    import json\n",
    "    anno_data = json.load(f)\n",
    "print(anno_data[0]['diff_label_texts'].keys())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "ea10b2cb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Merge Complete.\n",
      "Original keys preserved: ['index', 'fulltext', 'diff_label_texts', 'summary']\n",
      "Sample 'diff_label_texts' keys check: dict_keys(['low_health_literacy', 'intermediate_health_literacy', 'proficient_health_literacy'])\n"
     ]
    }
   ],
   "source": [
    "import json\n",
    "import pandas as pd\n",
    "\n",
    "# Define file paths\n",
    "gs_path = '/home/mshahidul/readctrl/data/testing_data_gs/multiclinsum_gs_train_en.json'\n",
    "syn_path = '/home/mshahidul/readctrl/data/synthetic_dataset_diff_labels/syn_data_diff_labels_en_v1.json'\n",
    "output_path = '/home/mshahidul/readctrl/data/synthetic_dataset_diff_labels/syn_data_with_gs_summary_en.json'\n",
    "\n",
    "# 1. Load Ground Truth Data\n",
    "with open(gs_path, 'r', encoding='utf-8') as f:\n",
    "    gs_data = json.load(f)\n",
    "\n",
    "# 2. Load Synthetic Data (Preserving all keys: index, fulltext, diff_label_texts)\n",
    "with open(syn_path, 'r', encoding='utf-8') as f:\n",
    "    syn_data = json.load(f)\n",
    "\n",
    "# Convert to DataFrames\n",
    "# We only need 'fulltext' and 'summary' from the GS file for the mapping\n",
    "df_gs = pd.DataFrame(gs_data)[['fulltext', 'summary']]\n",
    "df_gs = df_gs.drop_duplicates(subset=['fulltext'])\n",
    "\n",
    "# Create the Synthetic DataFrame (contains index, fulltext, diff_label_texts)\n",
    "df_syn = pd.DataFrame(syn_data)\n",
    "\n",
    "# 3. Perform Left Join\n",
    "# This keeps every column in df_syn and adds 'summary' where fulltext matches\n",
    "merged_df = pd.merge(df_syn, df_gs, on='fulltext', how='left')\n",
    "\n",
    "# 4. Save and Verify\n",
    "merged_data = merged_df.to_dict(orient='records')\n",
    "\n",
    "with open(output_path, 'w', encoding='utf-8') as f:\n",
    "    json.dump(merged_data, f, indent=4, ensure_ascii=False)\n",
    "\n",
    "print(f\"Merge Complete.\")\n",
    "print(f\"Original keys preserved: {list(merged_df.columns)}\")\n",
    "print(f\"Sample 'diff_label_texts' keys check: {merged_df.iloc[0]['diff_label_texts'].keys()}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "1b3c848f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dict_keys(['id', 'fulltext', 'summary'])\n"
     ]
    }
   ],
   "source": [
    "# /home/mshahidul/readctrl/data/testing_data_gs/multiclinsum_gs_train_en.json\n",
    "with open('/home/mshahidul/readctrl/data/testing_data_gs/multiclinsum_gs_train_en.json', 'r') as f:\n",
    "    import json\n",
    "    _data = json.load(f)\n",
    "print(_data[0].keys())\n",
    "a_dict = {}\n",
    "for item in _data:\n",
    "    a_dict[item['fulltext']] = item['summary']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "bb68d61b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# /home/mshahidul/readctrl/data/data_annotator_data/vector_db_all-miniLM/crowdsourcing_input_en_v2.json\n",
    "with open('/home/mshahidul/readctrl/data/synthetic_dataset_diff_labels/syn_data_diff_labels_en_v1.json', 'r') as f:\n",
    "    import json\n",
    "    gen_data = json.load(f)\n",
    "data={}\n",
    "for item in gen_data:\n",
    "    for label in list(item['diff_label_texts'].keys()):\n",
    "        # print(item.keys())\n",
    "        data.setdefault(item['index'], {})[label] = {\n",
    "        'fulltext': item['fulltext'],\n",
    "        # 'gold_summary': a_dict[item['fulltext']],\n",
    "        'generated_text': item['diff_label_texts'][label]\n",
    "    }\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "7fd3115c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "\n",
    "def convert_score(score: int) -> int:\n",
    "    if not 1 <= score <= 10:\n",
    "        raise ValueError(\"Score must be between 1 and 10\")\n",
    "    return math.ceil(score / 2)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "36ebb028",
   "metadata": {},
   "outputs": [],
   "source": [
    "full_data=[]\n",
    "for item in anno_data:\n",
    "    label=item['health_literacy_label']\n",
    "    full_text = data[item['doc_id']][label]['fulltext']\n",
    "    # gold_summary = data[item['doc_id']][label]['gold_summary']\n",
    "    generated_text = data[item['doc_id']][label]['generated_text']\n",
    "    human_score = convert_score(item['doc_rating'])\n",
    "    res=readability_training_prompt_with_human(full_text,generated_text,human_score)\n",
    "    full_data.append(res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "8b8df130",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"/home/mshahidul/readctrl/data/finetuning_data/classifier_en_data.json\", \"w\") as f:\n",
    "    json.dump(full_data, f, indent=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "3dfb6a3c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'conversations': [{'role': 'user',\n",
       "   'content': 'You are a medical readability evaluator.\\n\\n### Task\\nCompare the \"GENERATED TEXT\" against the \"FULL TEXT\" to determine its readability for a general, non-medical audience.\\n\\n### Input Data\\n- **FULL TEXT:** The patient was a 59-year-old Japanese man with a 28-year history of type 1 diabetes. He visited our hospital monthly for management of diabetes with intensive therapy employing multiple-dose insulin injections. His height and body weight were 168 cm and 52 kg (body mass index: 18.4 kg/m2), respectively. He showed depleted insulin secretion (serum C-peptide level was below the limit of detection), such that his blood glucose levels fluctuated severely, and his hemoglobin A1c (HbA1c) level was around 9.0% despite intensive insulin therapy. He had been diagnosed with asymptomatic chronic severe (grade III) aortic regurgitation (AR) 16 years before the current presentation but had declined follow-up for the AR. He had never undergone surgery nor the implantation of any prosthetic devices.\\n\\nEight days after his regular hospital visit, he visited an emergency clinic complaining of breathing difficulty and had a fever above 38℃. Until that day, he had not noticed any fever, chills, weakness, or any other symptoms. His blood pressure and pulse rate were 192/82 mmHg and 118/min, respectively. He showed orthopnea, and his oxygen saturation (SpO2) was 80%. He was transported to the emergency department of our hospital. A physical examination revealed a Levine 3/6 systolic murmur, although his cardiac murmur had not been checked at regular hospital visits. No physical findings suggesting IE, such as Osler nodes, Janeway lesions, or conjunctival petechiae, were recognized. His white blood cell (WBC) count was markedly increased to 20,800 /μL, and his C-reactive protein (CRP) was elevated to 6.06 mg/dL. Serum creatine phosphokinase MB was within the normal range, at 6.0 IU/L, and troponin T was negative. Chest X-ray showed pulmonary congestion with cardiac enlargement (cardiothoracic ratio: 55%). Electrocardiography revealed ST elevation on V1-V4, but emergency echocardiography showed no dysfunction of cardiac contractility. He was diagnosed with acute heart failure due to valvular disease, and treatment with non-invasive positive pressure ventilation and nitrates was initiated.\\n\\nAfter hospital admission, a detailed examination by transthoracic echocardiography showed severe aortic regurgitation, severe mitral regurgitation, and a mobile vegetation on the mitral valve. Transesophageal echocardiography revealed a 16.5×6-mm mobile vegetation on the anterior leaflet of the mitral valve and an 11.2×5-mm nonmobile vegetation on the noncoronary cusp of the aortic valve. These findings raised strong suspicion of NVE. In this case, head computed tomography (CT) and magnetic resonance imaging revealed no cerebral infarction or hemorrhaging, although a mobile vegetation was detected.\\n\\nOn reviewing the clinical course until hospitalization, we noted that at the visit four months before admission, his WBC count had been slightly elevated. The following month, his albumin (Alb) level decreased to 3.0 g/dL, and his hemoglobin (Hb) level had shown a gradual decline over the 2 months prior to admission. During this period, he had experienced a 4-kg weight loss. Esophagogastroduodenoscopy and whole-body CT were performed, but no abnormalities were detected. One month later, he had regained some weight, and the laboratory findings had nearly normalized, except for a slightly elevated CRP level (0.54 mg/dL). At the last visit (8 days before admission), his WBC count had again risen to 9,300 /μL, while his Hb and Alb levels had again decreased to 13.1 g/dL and 3.0 g/dL, respectively. Furthermore, his CRP level had increased to 4.18 mg/dL. At that time, his diastolic blood pressure has shown an obvious decrease. Thus far, he had not experienced a fever or any symptoms other than weight loss. We suspected diseases of infectious and/or malignant origin and initiated comprehensive examinations to identify the source of his clinical findings.\\n\\nAfter heart failure treatment had been started, his clinical symptoms showed rapid improvement, and his hemodynamic stability was maintained during the first six hours. He initially received empirical intravenous antibiotic therapy consisting of 12 g/day of ampicillin sulbactam (ABPC/S) and 120 mg/day of gentamycin (GM). Three blood culture sets were obtained on the admission, and all were positive for S. warneri [minimum inhibitory concentration (MIC) to ABPC/S ≤8 μg/mL; MIC to GM ≤1 μg/mL; MIC to cefazolin (CEZ) ≤2 μg/mL]. Thus, IE caused by this organism was diagnosed.\\n\\nAccording to the clinical guideline established by the Japanese Circulation Society, emergency surgery is generally recommended for heart failure of NYHA III to IV or urgent surgery for NVE mobile vegetation exceeding 10 mm and severe valve dysfunction. In this case, however, his heart failure was successfully improved. Based on the guideline, the risk of embolism was considered to have been reduced by the administration of appropriate antibiotic therapy. In addition, the patient had type 1 diabetes, and his glycemic control was so poor that we were concerned that double-valve surgery would be a high-risk procedure. Therefore, we planned elective surgery after sufficient control of both infection and diabetes.\\n\\nBased on the blood culture results, the antibiotic regimen was switched to 6 g/day of CEZ. A detailed dental examination revealed no abnormalities, such as periodontitis. After four weeks of antibiotic therapy, he underwent surgical therapy. His aortic valve was found to be bicuspid, and the aortic and mitral annuli were intact without abscess formation. Large vegetations were exenterated, and the mitral and aortic valves were both replaced with mechanical valves. He experienced no postoperative complications and was discharged on the 22nd day after the operation without apparent embolism. He has not had any recurrence in over two years since the operation.\\n- **GENERATED TEXT (Evaluate this):** A 59-year-old Japanese man with a 28-year history of type 1 diabetes on intensive multiple-dose insulin therapy (BMI 18.4 kg/m2, undetectable C‑peptide, HbA1c ~9.0%) and remote, asymptomatic chronic severe (grade III) aortic regurgitation (diagnosed 16 years earlier without subsequent follow‑up) presented with acute decompensated heart failure. He had never undergone surgery or prosthetic device implantation and had no history of immunosuppressive therapies.\\n\\nEight days after a routine visit, he developed dyspnea and fever >38℃. On arrival: BP 192/82 mmHg, HR 118/min, orthopnea, SpO2 80%. Exam: Levine 3/6 systolic murmur; no Osler nodes, Janeway lesions, or conjunctival petechiae. Labs: WBC 20,800/μL, CRP 6.06 mg/dL, CK‑MB 6.0 IU/L, troponin T negative. CXR showed pulmonary congestion with cardiomegaly (CTR 55%). ECG had ST elevation in V1–V4, but emergent echocardiography showed no systolic dysfunction. He was diagnosed with acute heart failure due to valvular disease and treated with non‑invasive positive pressure ventilation and nitrates.\\n\\nTransthoracic echocardiography demonstrated severe aortic regurgitation and severe mitral regurgitation with a mobile mitral vegetation. Transesophageal echocardiography identified a 16.5×6‑mm mobile vegetation on the anterior leaflet of the mitral valve and an 11.2×5‑mm nonmobile vegetation on the noncoronary cusp of the aortic valve, raising strong suspicion for native valve endocarditis (NVE). Head CT and MRI showed no cerebral infarction or hemorrhage.\\n\\nRetrospective review revealed subtle abnormalities starting four months pre‑admission: mildly elevated WBC, albumin decreased to 3.0 g/dL the following month, and gradual hemoglobin decline over two months, with a 4‑kg weight loss. EGD and whole‑body CT were unrevealing. He partially regained weight and labs nearly normalized except for a CRP of 0.54 mg/dL. At the last pre‑admission visit (8 days prior), WBC was 9,300/μL, Hb 13.1 g/dL, Alb 3.0 g/dL, CRP 4.18 mg/dL, and diastolic BP had fallen; he remained afebrile and asymptomatic aside from weight loss.\\n\\nEmpiric antibiotics were initiated with ampicillin–sulbactam 12 g/day plus gentamicin 120 mg/day. Three admission blood culture sets all grew Staphylococcus warneri, a coagulase‑negative staphylococcus (CoNS) and resident skin flora (MICs: ABPC/S ≤8 μg/mL; GM ≤1 μg/mL; CEZ ≤2 μg/mL), confirming S. warneri IE. Per Japanese Circulation Society guidance, emergency surgery is generally recommended for NYHA III–IV heart failure or urgent surgery for NVE with mobile vegetation >10 mm and severe valve dysfunction. Because heart failure improved rapidly and appropriate antibiotics were started (reducing embolic risk), and given poorly controlled type 1 diabetes increasing operative risk, elective surgery was planned after stabilization of infection and glycemia. Antibiotics were narrowed to cefazolin 6 g/day; dental evaluation showed no periodontitis.\\n\\nAfter four weeks of antibiotics, surgery revealed a bicuspid aortic valve with intact aortic and mitral annuli and no abscess. Large vegetations were exenterated, and both valves were replaced with mechanical prostheses. The postoperative course was uneventful; he was discharged on postoperative day 22 without apparent embolism and has remained recurrence‑free for over two years. This case represents NVE due to the resident CoNS S. warneri in a patient without prosthetic material or immunosuppression, with prodromal laboratory abnormalities and weight loss evident up to four months before presentation.\\n\\n### Readability Scale\\n1: Very Easy - Minimal medical language, uses simple terms.\\n2: Easy - Accessible to most, minor jargon explained.\\n3: Medium - Some technical terms, moderate complexity.\\n4: Hard - Clinical tone, assumes some prior knowledge.\\n5: Very Hard - Extremely technical, requires medical expertise.\\n\\n### Constraints\\n- Evaluate ONLY the \"GENERATED TEXT\".\\n- Use \"FULL TEXT\" only for context of the subject matter.\\n- Do NOT assess factual accuracy.\\n\\n### Output Format\\nReturn ONLY the following JSON object:\\n{\\n  \"readability_score\": 5\\n}'},\n",
       "  {'role': 'assistant', 'content': '{\"readability_score\": 5}'}]}"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "full_data[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6dfc340b",
   "metadata": {},
   "outputs": [],
   "source": [
    "dict_keys(['queue_position', 'doc_id', 'health_literacy_label', 'wiki_id', 'doc_snippet', 'wiki_snippet', 'doc_rating', 'wiki_rating', 'is_duplicate', 'timestamp'])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "un",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}