monajm36 commited on
Commit
1c65c08
·
verified ·
1 Parent(s): bb69800

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +130 -0
README.md CHANGED
@@ -89,3 +89,133 @@ probs = torch.softmax(logits, dim=-1).squeeze()
89
  p_ohca = float(probs[1]) # index 1 = OHCA, index 0 = Non-OHCA
90
 
91
  print({"p_ohca": p_ohca})
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
  p_ohca = float(probs[1]) # index 1 = OHCA, index 0 = Non-OHCA
90
 
91
  print({"p_ohca": p_ohca})
92
+
93
+ Decision threshold
94
+
95
+ This is a probability cutoff for calling a note “OHCA.” You can tune it to your setting:
96
+
97
+ High sensitivity (screening): 0.28–0.32
98
+
99
+ Balanced: 0.36 (v8 validation-optimized neighborhood)
100
+
101
+ Higher precision: 0.50+
102
+
103
+ def predict_ohca(text, threshold=0.32):
104
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
105
+ with torch.no_grad():
106
+ logits = model(**inputs).logits
107
+ p = torch.softmax(logits, dim=-1)[0,1].item()
108
+ label = "OHCA" if p >= threshold else "Non-OHCA"
109
+ return {"label": label, "p_ohca": p, "threshold": threshold}
110
+
111
+ print(predict_ohca(text, threshold=0.32))
112
+
113
+ Data and preprocessing
114
+
115
+ Source: MIMIC-derived discharge notes (internal processing).
116
+
117
+ Sections used:
118
+
119
+ Chief Complaint
120
+
121
+ History of Present Illness (also recognized as “History of Present Illness:” / “HPI”)
122
+
123
+ Class distribution (330 total):
124
+
125
+ Non-OHCA: 242 (73.3%)
126
+
127
+ OHCA: 47 (14.2%)
128
+
129
+ Inter-facility transfers: 23 (7.0%)
130
+
131
+ In-hospital arrests: 18 (5.5%)
132
+
133
+ Only the binary OHCA vs Non-OHCA head is used at inference here. Multi-class labels helped training signal.
134
+
135
+ Splits (patient-level): Train 210, Val 54, Test 66 unique admissions.
136
+
137
+ Max length: 512 tokens.
138
+
139
+ Training
140
+
141
+ Base: PubMedBERT (microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract)
142
+
143
+ Epochs: 5–6 with continued improvement through epoch 4
144
+
145
+ Batching: small batch + gradient accumulation
146
+
147
+ Sampler: class-balanced mini-batches
148
+
149
+ Loss: weighted cross-entropy (class imbalance)
150
+
151
+ Optimizer/schedule: AdamW + linear decay
152
+
153
+ Hardware: CPU training (works for inference on CPU)
154
+
155
+ Evaluation (test set)
156
+
157
+ Confusion matrix at a recall-oriented operating point:
158
+
159
+ Pred Non-OHCA Pred OHCA
160
+ Actual Non 51 7
161
+ Actual OHCA 0 9
162
+
163
+ Metrics
164
+
165
+ Sensitivity (Recall): 1.000
166
+
167
+ Specificity: 0.879
168
+
169
+ Precision (PPV): 0.562
170
+
171
+ NPV: 1.000
172
+
173
+ F1-score: 0.720
174
+
175
+ ROC-AUC: 0.971
176
+
177
+ Interpretation: At this threshold, the model did not miss any OHCA cases in the test set, at the cost of 7 false positives.
178
+
179
+ Threshold selection guide
180
+
181
+ Pick a threshold that matches your use case:
182
+
183
+ Screening (don’t miss OHCA): 0.28–0.32
184
+
185
+ Balanced review load: around 0.36
186
+
187
+ Fewer false positives: ≥ 0.50
188
+
189
+ If you need to optimize explicitly for recall, compute the best Fβ with β>1 (e.g., F2) on your validation set and use that threshold in production.
190
+
191
+ Limitations
192
+
193
+ Trained on a specific documentation style; performance may vary on different systems.
194
+
195
+ English only; text quality and section headers matter.
196
+
197
+ Always keep a human-in-the-loop for high-stakes decisions.
198
+
199
+ Citation
200
+
201
+ If you use this model, please cite:
202
+
203
+ M. Moukaddem. OHCA Classifier v8: PubMedBERT fine-tuned for Out-of-Hospital Cardiac Arrest detection in discharge notes. 2025. https://huggingface.co/monajm36/ohca-classifier-v8
204
+
205
+ License
206
+
207
+ MIT
208
+
209
+
210
+ ---
211
+
212
+ ## Optional polish (recommended)
213
+ - On the model page, click **Edit model card** and fill the left-hand **Metadata UI**:
214
+ - `license`: mit
215
+ - `language`: en
216
+ - `base_model`: microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract
217
+ - `pipeline_tag`: text-classification
218
+ - Turn on the **Inference widget** so people can paste text and see probabilities.
219
+ - Add a **README image** later (e.g., a small confusion matrix figure).
220
+
221
+ If you want, I can also generate a small Python snippet that reads your validation set, computes **F2-optimal** threshold, and prints a compact threshold table to include in the card.