monajm36 commited on
Commit
ba11db9
Β·
verified Β·
1 Parent(s): a3b9749

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +239 -292
README.md CHANGED
@@ -1,77 +1,80 @@
1
- ---
2
- license: mit
3
- base_model: microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract
4
- tags:
5
- - text-classification
6
- - medical
7
- - cardiac-arrest
8
- - clinical-nlp
9
- - bert
10
- - healthcare
11
- - pubmedbert
12
- library_name: transformers
13
- pipeline_tag: text-classification
14
- widget:
15
- - text: "HISTORY OF PRESENT ILLNESS: This is a 67-year-old male with a history of coronary artery disease who presented after out-of-hospital cardiac arrest. The patient was at home when he suddenly collapsed. His wife witnessed the event and called 911. EMS arrived and found the patient in ventricular fibrillation."
16
- example_title: "Clear OHCA Case"
17
- - text: "HISTORY OF PRESENT ILLNESS: This is a 45-year-old female presenting with acute onset chest pain. The patient was at work when she developed sudden onset substernal chest pain, described as pressure-like, 8/10 in intensity. No loss of consciousness. Vital signs stable on arrival."
18
- example_title: "Non-OHCA Case"
19
- metrics:
20
- - name: F1-Score
21
- type: f1
22
- value: 0.632
23
- - name: Sensitivity
24
- type: recall
25
- value: 1.000
26
- - name: Specificity
27
- type: specificity
28
- value: 0.741
29
- model-index:
30
- - name: ohca-classifier-v3-trained
31
- results:
32
- - task:
33
- type: text-classification
34
- name: Medical Text Classification
35
- dataset:
36
- type: medical-discharge-notes
37
- name: MIMIC-Based OHCA Dataset
38
- metrics:
39
- - name: F1-Score
40
- type: f1
41
- value: 0.632
42
- - name: Sensitivity
43
- type: recall
44
- value: 1.000
45
- - name: Specificity
46
- type: specificity
47
- value: 0.741
48
- ---
49
-
50
- # OHCA Classifier v3.0 - Clinical Ready Model
51
-
52
- πŸ₯ **Ready-to-use BERT classifier for detecting Out-of-Hospital Cardiac Arrest (OHCA) in medical discharge notes**
53
-
54
- ## πŸš€ Quick Start (5 Minutes)
55
-
56
- **Want to test immediately?** Install and run:
57
 
58
- ```bash
59
- pip install transformers torch pandas
60
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
- Then copy-paste this working example:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
64
  ```python
65
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
66
  import torch
67
 
68
- # Load model
69
  model_name = "monajm36/ohca-classifier-v3-trained"
70
  tokenizer = AutoTokenizer.from_pretrained(model_name)
71
  model = AutoModelForSequenceClassification.from_pretrained(model_name)
72
 
73
- def predict_ohca(text, threshold=0.90): # Using practical 90% threshold
74
- inputs = tokenizer(text, truncation=True, padding=True,
 
 
 
 
 
75
  max_length=512, return_tensors="pt")
76
 
77
  with torch.no_grad():
@@ -79,306 +82,250 @@ def predict_ohca(text, threshold=0.90): # Using practical 90% threshold
79
  probs = torch.softmax(outputs.logits, dim=-1)
80
  ohca_prob = probs[0][1].item()
81
 
82
- prediction = "OHCA" if ohca_prob >= threshold else "Non-OHCA"
83
 
 
84
  if ohca_prob >= 0.996:
85
- priority = "πŸ”΄ Immediate Review"
86
- elif ohca_prob >= 0.95:
87
- priority = "πŸ”΄ High Priority"
88
  elif ohca_prob >= 0.90:
89
- priority = "🟑 Priority Review"
90
- elif ohca_prob >= 0.80:
91
- priority = "🟠 Consider Review"
92
  else:
93
- priority = "🟒 Routine"
 
 
94
 
95
  return {
96
- "prediction": prediction,
97
- "probability": round(ohca_prob, 4),
98
- "confidence": f"{ohca_prob*100:.1f}%",
99
- "clinical_priority": priority
 
100
  }
101
 
102
- # Test with realistic case
103
- ohca_text = """HISTORY OF PRESENT ILLNESS: This is a 67-year-old male with a history of coronary artery disease who presented after out-of-hospital cardiac arrest. The patient was at home when he suddenly collapsed. His wife witnessed the event and called 911. EMS arrived and found the patient in ventricular fibrillation. CPR was initiated immediately with defibrillation. Return of spontaneous circulation was achieved after 15 minutes."""
104
-
105
- result = predict_ohca(ohca_text)
106
  print(f"Prediction: {result['prediction']}")
107
- print(f"Confidence: {result['confidence']}")
108
  print(f"Clinical Priority: {result['clinical_priority']}")
109
- # Expected Output: OHCA, ~98% confidence, Priority Review
110
  ```
111
 
112
- ---
113
-
114
- ## ⚠️ Critical: Understanding Thresholds
115
-
116
- **Important:** The model's training used a 99.6% threshold, but this may be **too conservative for clinical practice**.
117
-
118
- Here's what different thresholds mean:
119
-
120
- | Threshold | Use Case | Trade-off |
121
- |-----------|----------|-----------|
122
- | **99.6%** | Research, ultra-conservative | May miss obvious OHCA cases |
123
- | **95%** | High-confidence clinical screening | Good balance, still conservative |
124
- | **90%** | **Recommended for most clinical use** | Practical screening threshold |
125
- | **85%** | Sensitive screening | Catches more cases, more false positives |
126
-
127
- ### Test Different Thresholds
128
 
129
  ```python
130
- # Test the same case with different thresholds
131
- text = "Your discharge note text here..."
132
- thresholds = [0.996, 0.95, 0.90, 0.85]
133
-
134
- for threshold in thresholds:
135
- result = predict_ohca(text, threshold)
136
- print(f"Threshold {threshold*100:.1f}%: {result['prediction']} ({result['confidence']})")
 
 
 
 
 
 
 
 
 
137
  ```
138
 
139
- ---
140
-
141
- ## πŸ“Š Analyze Your Data
142
-
143
- ### Single CSV File Analysis
144
 
145
  ```python
146
  import pandas as pd
147
 
148
- def analyze_discharge_notes(csv_file, text_column='clean_text', threshold=0.90):
149
- """Analyze your discharge notes - works with any CSV format"""
150
-
151
- # Load data
152
- df = pd.read_csv(csv_file)
153
- print(f"πŸ“‹ Loaded {len(df)} records")
154
-
155
- # Analyze each note
156
  results = []
157
- for idx, text in enumerate(df[text_column]):
158
- if idx % 100 == 0: # Progress update
159
- print(f" Processed {idx}/{len(df)}...")
160
-
161
- result = predict_ohca(str(text), threshold)
162
  results.append(result)
163
 
164
- # Add results to your data
165
  df['ohca_prediction'] = [r['prediction'] for r in results]
166
- df['ohca_probability'] = [r['probability'] for r in results]
167
- df['ohca_confidence'] = [r['confidence'] for r in results]
168
  df['clinical_priority'] = [r['clinical_priority'] for r in results]
169
 
170
- # Save results with timestamp
171
- from datetime import datetime
172
- timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
173
- output_file = f"ohca_analysis_{timestamp}.csv"
174
- df.to_csv(output_file, index=False)
175
-
176
- # Clinical summary
177
- total = len(df)
178
- ohca_cases = len(df[df['ohca_prediction'] == 'OHCA'])
179
- immediate = len(df[df['clinical_priority'].str.contains('Immediate')])
180
- high_priority = len(df[df['clinical_priority'].str.contains('High Priority|Priority Review')])
181
-
182
- print(f"\nπŸ₯ CLINICAL SUMMARY:")
183
- print(f" Total cases analyzed: {total:,}")
184
- print(f" Predicted OHCA: {ohca_cases:,} ({ohca_cases/total*100:.1f}%)")
185
- print(f" πŸ”΄ Immediate review needed: {immediate:,}")
186
- print(f" 🟑 High priority cases: {high_priority:,}")
187
- print(f" πŸ“ Results saved: {output_file}")
188
-
189
  return df
190
 
191
- # Usage
192
- results = analyze_discharge_notes('your_discharge_notes.csv', threshold=0.90)
193
-
194
- # Filter high-priority cases
195
- high_priority = results[results['clinical_priority'].str.contains('Immediate|High Priority')]
196
- high_priority.to_csv('high_priority_ohca_cases.csv', index=False)
 
 
 
 
 
 
197
  ```
198
 
199
- ### Your Data Format
200
 
201
- The CSV should have at minimum:
202
- - **Text column**: Discharge note content (any column name works)
203
- - **ID column**: Case identifier (optional but recommended)
 
 
 
 
 
 
204
 
205
- Example:
206
- ```csv
207
- case_id,discharge_text
208
- 12345,"HISTORY OF PRESENT ILLNESS: 67-year-old male with cardiac arrest at home..."
209
- 12346,"HISTORY OF PRESENT ILLNESS: 45-year-old female with chest pain..."
210
  ```
211
 
212
- ---
213
 
214
- ## πŸ”¬ Model Details
215
 
216
- ### Architecture
217
- - **Base Model**: PubMedBERT (specialized for medical text)
218
- - **Task**: Binary classification (OHCA vs Non-OHCA)
219
- - **Parameters**: 109M
220
- - **Max Length**: 512 tokens
221
- - **Language**: English medical text
222
 
223
- ### Training Data
224
- - **Total Cases**: 330 medical discharge notes
225
- - **OHCA Cases**: 59 (17.9%)
226
- - **Data Source**: MIMIC-III derived
227
- - **Validation**: Patient-level splits (prevents data leakage)
 
228
 
229
- ### Performance (at 99.6% threshold)
230
- | Metric | Value | Clinical Meaning |
231
- |--------|--------|------------------|
232
- | **Sensitivity** | 100% | Catches ALL true OHCA cases |
233
- | **Specificity** | 74.1% | Correctly identifies non-OHCA cases |
234
- | **F1-Score** | 0.632 | Balanced precision and recall |
235
 
236
- **Note**: These metrics are at the ultra-conservative 99.6% threshold. At 90% threshold, you'll have different (likely more practical) performance characteristics.
 
 
 
237
 
238
- ---
 
 
 
239
 
240
- ## πŸ₯ Clinical Workflow Integration
241
 
242
- ### Recommended Clinical Process
 
 
 
 
243
 
244
- 1. **Batch Analysis**: Run model on all discharge notes
245
- 2. **Priority Triage**:
246
- - πŸ”΄ **Immediate Review** (β‰₯99.6%): Urgent medical review
247
- - πŸ”΄ **High Priority** (β‰₯95%): Clinical team review within 24h
248
- - 🟑 **Priority Review** (β‰₯90%): Review within 48h
249
- - 🟠 **Consider Review** (β‰₯80%): Weekly review process
250
- - 🟒 **Routine** (<80%): Standard processing
251
 
252
- 3. **Quality Assurance**: Sample manual review to validate model performance on your specific data
 
 
 
 
 
 
 
253
 
254
- ### Large Dataset Processing
255
 
256
- ```python
257
- def process_large_dataset(csv_file, chunk_size=1000):
258
- """Process very large datasets in chunks"""
259
- import pandas as pd
260
-
261
- # Process in chunks to manage memory
262
- chunk_results = []
263
-
264
- for chunk_num, chunk in enumerate(pd.read_csv(csv_file, chunksize=chunk_size)):
265
- print(f"Processing chunk {chunk_num + 1}...")
266
-
267
- results = []
268
- for text in chunk['clean_text']: # Adjust column name
269
- result = predict_ohca(text)
270
- results.append(result)
271
-
272
- chunk['ohca_prediction'] = [r['prediction'] for r in results]
273
- chunk['ohca_probability'] = [r['probability'] for r in results]
274
- chunk['clinical_priority'] = [r['clinical_priority'] for r in results]
275
-
276
- chunk_results.append(chunk)
277
-
278
- # Combine all chunks
279
- final_results = pd.concat(chunk_results, ignore_index=True)
280
- final_results.to_csv('large_dataset_results.csv', index=False)
281
-
282
- return final_results
283
- ```
284
 
285
- ---
286
 
287
- ## 🚨 Limitations & Important Considerations
 
 
 
 
288
 
289
- ### Clinical Limitations
290
- - **Intended for screening**: Assists, does not replace clinical judgment
291
- - **Text-only**: Based solely on discharge note text
292
- - **English medical text**: Designed for US healthcare documentation
293
- - **Hospital variation**: May need validation on your specific system
294
 
295
- ### Ethical Use
296
- - **Human oversight required**: All predictions should be clinically reviewed
297
- - **Bias monitoring**: Evaluate performance across patient demographics
298
- - **HIPAA compliance**: Ensure proper data handling in your environment
299
- - **Documentation**: Maintain audit trail of model-assisted decisions
300
 
301
  ### Performance Variations
302
- Model accuracy may vary based on:
303
- - Documentation styles and quality
304
- - Patient populations and demographics
305
- - Types of cardiac arrest presentations
306
- - Clinical terminology variations
307
 
308
- ---
 
 
 
 
309
 
310
- ## πŸ“š Related Resources
311
 
312
- - **Source Code**: [GitHub - OHCA Classifier v3.0](https://github.com/monajm36/ohca-classifier-3.0)
313
- - **Training Pipeline**: Full methodology for custom model development
314
- - **Research Paper**: Enhanced methodology with patient-level splits
315
- - **Community**: Issues and discussions on GitHub
316
 
317
- ## πŸ† Advanced Features
 
 
 
318
 
319
- ### Custom Threshold Optimization
320
 
321
- ```python
322
- def find_optimal_threshold(labeled_data_csv):
323
- """Find best threshold for your specific dataset"""
324
- import pandas as pd
325
- from sklearn.metrics import classification_report
326
-
327
- # Load your labeled validation data
328
- df = pd.read_csv(labeled_data_csv) # Should have 'text' and 'true_label' columns
329
-
330
- # Test different thresholds
331
- thresholds = [0.99, 0.95, 0.90, 0.85, 0.80, 0.75]
332
- best_threshold = 0.90
333
- best_f1 = 0
334
-
335
- for threshold in thresholds:
336
- predictions = []
337
- for text in df['text']:
338
- result = predict_ohca(text, threshold)
339
- pred = 1 if result['prediction'] == 'OHCA' else 0
340
- predictions.append(pred)
341
-
342
- # Calculate metrics
343
- report = classification_report(df['true_label'], predictions, output_dict=True)
344
- f1 = report['1']['f1-score'] # F1 for OHCA class
345
-
346
- print(f"Threshold {threshold}: F1 = {f1:.3f}")
347
-
348
- if f1 > best_f1:
349
- best_f1 = f1
350
- best_threshold = threshold
351
-
352
- print(f"\nRecommended threshold for your data: {best_threshold}")
353
- return best_threshold
354
  ```
355
 
356
- ---
 
 
 
 
 
357
 
358
- ## πŸ“ž Support & Citation
359
 
360
- ### Getting Help
361
- - **Issues**: Report problems on [GitHub](https://github.com/monajm36/ohca-classifier-3.0/issues)
362
- - **Questions**: Use GitHub discussions for clinical workflow questions
363
- - **Updates**: Watch the repository for model improvements
364
-
365
- ### Citation
366
 
367
  ```bibtex
368
  @software{ohca_classifier_v3_trained,
369
- title={OHCA Classifier v3.0: Clinical-Ready BERT Model for Cardiac Arrest Detection},
370
  author={Mona Moukaddem},
371
  year={2025},
372
  url={https://huggingface.co/monajm36/ohca-classifier-v3-trained},
373
- note={Production-ready classifier with flexible thresholds for clinical deployment}
374
  }
375
  ```
376
 
377
- ### License
378
- MIT License - Free for clinical and research use
 
 
 
 
 
 
 
379
 
380
- ---
381
 
382
- **πŸ₯ Ready to get started? Copy the Quick Start code above and test it on your data!**
 
 
 
 
383
 
384
- *This model is designed for clinical decision support. Always validate performance on your specific data and maintain appropriate clinical oversight.*
 
1
+ # OHCA Classifier v3.0 - Trained Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ ## Model Description
4
+
5
+ This is a trained BERT-based classifier for detecting Out-of-Hospital Cardiac Arrest (OHCA) cases in medical discharge notes. The model is fine-tuned from PubMedBERT and achieves high sensitivity for OHCA detection with configurable thresholds for different clinical needs.
6
+
7
+ ## Model Details
8
+
9
+ - **Model Name**: OHCA Classifier v3.0 - Trained
10
+ - **Base Model**: microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract
11
+ - **Task**: Binary text classification (OHCA vs Non-OHCA)
12
+ - **Language**: English
13
+ - **Domain**: Medical/Clinical text
14
+ - **Model Version**: 3.0
15
+ - **Author**: Mona Moukaddem
16
+ - **Model Size**: 109M parameters
17
+ - **License**: MIT
18
+
19
+ ## Performance Metrics
20
+
21
+ | Metric | Value | Description |
22
+ |---|---|---|
23
+ | Optimal Threshold | 0.996 | Found via validation set optimization |
24
+ | F1-Score | 0.632 | Harmonic mean of precision and recall |
25
+ | Sensitivity (Recall) | 1.000 | 100% - Catches all OHCA cases at optimal threshold |
26
+ | Specificity | 0.741 | 74.1% - Correctly identifies non-OHCA cases |
27
+ | AUC-ROC | High | Excellent discrimination ability |
28
+
29
+ ## Threshold Selection Guide
30
+
31
+ **For Clinical Screening (Recommended): 0.90**
32
+ - Good balance of sensitivity and specificity
33
+ - Reduces false positives while maintaining high sensitivity
34
+ - Suitable for most clinical workflows and screening applications
35
 
36
+ **For Ultra-Conservative Screening: 0.996**
37
+ - Optimal threshold from validation set optimization
38
+ - Maximizes sensitivity (100%)
39
+ - May produce more false positives in some populations
40
+ - Use when missing OHCA cases is extremely costly
41
+
42
+ **For Research/Validation: Variable**
43
+ - Adjust based on your specific requirements
44
+ - Consider your population's OHCA prevalence
45
+ - Validate performance on your own dataset
46
+
47
+ ## Training Data
48
+
49
+ | Dataset Characteristic | Value |
50
+ |---|---|
51
+ | Total Cases | 330 |
52
+ | OHCA Cases | 59 (17.9%) |
53
+ | Non-OHCA Cases | 271 (82.1%) |
54
+ | Training Split | 264 cases |
55
+ | Validation Split | 66 cases |
56
+ | Data Source | MIMIC-III derived discharge notes |
57
+
58
+ ## Usage
59
+
60
+ ### Quick Start
61
 
62
  ```python
63
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
64
  import torch
65
 
66
+ # Load the model
67
  model_name = "monajm36/ohca-classifier-v3-trained"
68
  tokenizer = AutoTokenizer.from_pretrained(model_name)
69
  model = AutoModelForSequenceClassification.from_pretrained(model_name)
70
 
71
+ # Threshold options
72
+ recommended_threshold = 0.90 # Recommended for clinical screening
73
+ optimal_threshold = 0.996 # From validation set optimization
74
+
75
+ def predict_ohca(text, threshold=0.90):
76
+ """Predict OHCA from medical text"""
77
+ inputs = tokenizer(text, truncation=True, padding=True,
78
  max_length=512, return_tensors="pt")
79
 
80
  with torch.no_grad():
 
82
  probs = torch.softmax(outputs.logits, dim=-1)
83
  ohca_prob = probs[0][1].item()
84
 
85
+ prediction = 1 if ohca_prob >= threshold else 0
86
 
87
+ # Clinical priority based on probability
88
  if ohca_prob >= 0.996:
89
+ priority = "Immediate Review"
 
 
90
  elif ohca_prob >= 0.90:
91
+ priority = "Priority Review"
92
+ elif ohca_prob >= 0.70:
93
+ priority = "Consider Review"
94
  else:
95
+ priority = "Routine"
96
+
97
+ confidence = "High" if ohca_prob >= 0.90 else "Medium" if ohca_prob >= 0.50 else "Low"
98
 
99
  return {
100
+ "prediction": "OHCA" if prediction == 1 else "Non-OHCA",
101
+ "probability": ohca_prob,
102
+ "confidence": confidence,
103
+ "clinical_priority": priority,
104
+ "threshold_used": threshold
105
  }
106
 
107
+ # Example usage
108
+ text = "Patient presents with cardiac arrest at home, found down by family"
109
+ result = predict_ohca(text) # Uses recommended 0.90 threshold
 
110
  print(f"Prediction: {result['prediction']}")
111
+ print(f"Probability: {result['probability']:.3f}")
112
  print(f"Clinical Priority: {result['clinical_priority']}")
 
113
  ```
114
 
115
+ ### Pipeline Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
 
117
  ```python
118
+ from transformers import pipeline
119
+
120
+ # Create classification pipeline
121
+ classifier = pipeline("text-classification", model="monajm36/ohca-classifier-v3-trained")
122
+
123
+ # Classify medical text
124
+ text = "Patient presents with cardiac arrest at home"
125
+ result = classifier(text)
126
+ print(result)
127
+ # Output: [{'label': 'LABEL_1', 'score': 0.998}]
128
+ # LABEL_0 = Non-OHCA, LABEL_1 = OHCA
129
+
130
+ # For clinical use, apply appropriate threshold:
131
+ probability = result[0]['score'] if result[0]['label'] == 'LABEL_1' else 1 - result[0]['score']
132
+ is_ohca_90 = probability >= 0.90 # Recommended threshold
133
+ is_ohca_996 = probability >= 0.996 # Optimal threshold
134
  ```
135
 
136
+ ### Batch Processing
 
 
 
 
137
 
138
  ```python
139
  import pandas as pd
140
 
141
+ def process_medical_notes(df, text_column='clean_text', threshold=0.90):
142
+ """Process multiple medical notes"""
 
 
 
 
 
 
143
  results = []
144
+
145
+ for text in df[text_column]:
146
+ result = predict_ohca(text, threshold=threshold)
 
 
147
  results.append(result)
148
 
149
+ # Add results to dataframe
150
  df['ohca_prediction'] = [r['prediction'] for r in results]
151
+ df['ohca_probability'] = [r['probability'] for r in results]
 
152
  df['clinical_priority'] = [r['clinical_priority'] for r in results]
153
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
154
  return df
155
 
156
+ # Example with DataFrame
157
+ medical_notes = pd.DataFrame({
158
+ 'patient_id': [1, 2, 3],
159
+ 'clean_text': [
160
+ "Patient found in cardiac arrest at home by spouse",
161
+ "Patient complains of chest pain, vital signs stable",
162
+ "Witnessed cardiac arrest in emergency department"
163
+ ]
164
+ })
165
+
166
+ results = process_medical_notes(medical_notes)
167
+ print(results[['patient_id', 'ohca_prediction', 'ohca_probability']])
168
  ```
169
 
170
+ ### Compare Different Thresholds
171
 
172
+ ```python
173
+ def compare_thresholds(text):
174
+ """Compare predictions at different thresholds"""
175
+ thresholds = [0.50, 0.70, 0.90, 0.996]
176
+
177
+ for threshold in thresholds:
178
+ result = predict_ohca(text, threshold=threshold)
179
+ print(f"Threshold {threshold}: {result['prediction']} "
180
+ f"(p={result['probability']:.3f}, priority={result['clinical_priority']})")
181
 
182
+ # Example comparison
183
+ text = "Patient found down at home, family performed CPR"
184
+ compare_thresholds(text)
 
 
185
  ```
186
 
187
+ ## Clinical Decision Support
188
 
189
+ The model provides configurable sensitivity for OHCA detection, making it suitable for clinical screening where different thresholds may be appropriate based on clinical context and cost of missed cases.
190
 
191
+ ### Clinical Workflow Integration
 
 
 
 
 
192
 
193
+ | Probability Range | Clinical Priority | Recommended Action |
194
+ |---|---|---|
195
+ | β‰₯ 0.996 | πŸ”΄ Immediate Review | Very high confidence - Urgent review required |
196
+ | 0.90 - 0.995 | 🟑 Priority Review | High confidence - Clinical team review |
197
+ | 0.70 - 0.89 | 🟠 Consider Review | Moderate confidence - Consider for review |
198
+ | < 0.70 | 🟒 Routine | Low probability - Standard processing |
199
 
200
+ ### Threshold Selection for Clinical Use
 
 
 
 
 
201
 
202
+ **Use 0.90 threshold when:**
203
+ - Screening large volumes of discharge notes
204
+ - Balancing sensitivity with manageable false positive rates
205
+ - Implementing in routine clinical workflows
206
 
207
+ **Use 0.996 threshold when:**
208
+ - Ultra-high sensitivity is required
209
+ - Cost of missing OHCA cases is extremely high
210
+ - You have resources to review more false positives
211
 
212
+ ## Quality Assurance
213
 
214
+ - **High Sensitivity**: Configurable thresholds ensure no OHCA cases are missed
215
+ - **Optimal Threshold**: 0.996 maximizes sensitivity on validation data
216
+ - **Clinical Threshold**: 0.90 provides practical balance for screening
217
+ - **Patient-Level Training**: Prevents data leakage and overfitting
218
+ - **Clinical Validation**: Designed for real-world medical text processing
219
 
220
+ ## Model Architecture
 
 
 
 
 
 
221
 
222
+ ```
223
+ PubMedBERT (microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract)
224
+ β”œβ”€β”€ 12 Transformer layers
225
+ β”œβ”€β”€ 768 hidden dimensions
226
+ β”œβ”€β”€ 12 attention heads
227
+ β”œβ”€β”€ 109M parameters
228
+ └── Classification head (2 classes: OHCA vs Non-OHCA)
229
+ ```
230
 
231
+ ## Training Details
232
 
233
+ | Training Parameter | Value |
234
+ |---|---|
235
+ | Framework | PyTorch + Transformers |
236
+ | Optimizer | AdamW |
237
+ | Learning Rate | Default (with linear scheduling) |
238
+ | Epochs | 3 |
239
+ | Batch Size | 8 (with gradient accumulation) |
240
+ | Max Sequence Length | 512 tokens |
241
+ | Class Balancing | Weighted loss + minority oversampling |
242
+ | Validation Strategy | Patient-level splits (prevents data leakage) |
243
+ | Hardware | CPU training |
244
+
245
+ ## Evaluation Strategy
246
+
247
+ - **Patient-Level Data Splits**: Ensures all notes from the same patient stay in one split
248
+ - **Optimal Threshold Finding**: Uses validation set to find best decision threshold
249
+ - **Independent Test Set**: Unbiased evaluation on held-out data
250
+ - **Clinical Metrics**: Focus on sensitivity for medical screening applications
251
+
252
+ ## Limitations and Considerations
 
 
 
 
 
 
 
 
253
 
254
+ ### Limitations
255
 
256
+ - Trained on specific medical text format (discharge notes)
257
+ - May not generalize to different hospital systems without fine-tuning
258
+ - Performance may vary with different patient populations
259
+ - Designed specifically for English medical text
260
+ - Limited to text-based OHCA detection (no multimodal inputs)
261
 
262
+ ### Ethical Considerations
 
 
 
 
263
 
264
+ - **Clinical Use**: This model is intended to assist, not replace, clinical judgment
265
+ - **Bias Monitoring**: Regular evaluation across different patient demographics recommended
266
+ - **Human Oversight**: All high-probability predictions should be reviewed by medical professionals
267
+ - **Privacy**: Ensure compliance with healthcare data regulations (HIPAA, etc.)
 
268
 
269
  ### Performance Variations
 
 
 
 
 
270
 
271
+ Model performance may vary across different:
272
+ - Hospital systems and documentation styles
273
+ - Patient demographics and populations
274
+ - Types of cardiac arrest presentations
275
+ - Clinical documentation quality and completeness
276
 
277
+ ## Related Work
278
 
279
+ This model is based on the OHCA Classifier v3.0 methodology with significant improvements over previous versions:
 
 
 
280
 
281
+ - **Enhanced Methodology**: Patient-level splits, optimal threshold finding
282
+ - **Source Code**: Available at [monajm36/ohca-classifier-3.0](https://github.com/monajm36/ohca-classifier-3.0)
283
+ - **Training Pipeline**: Complete v3.0 training workflow for custom model development
284
+ - **Research Foundation**: Built on established medical NLP and machine learning best practices
285
 
286
+ ## Installation and Dependencies
287
 
288
+ ```bash
289
+ pip install transformers torch pandas numpy
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
290
  ```
291
 
292
+ **Minimum Requirements:**
293
+ - Python 3.8+
294
+ - PyTorch 1.9+
295
+ - Transformers 4.20+
296
+ - 4GB RAM for inference
297
+ - GPU optional (model works on CPU)
298
 
299
+ ## Citation
300
 
301
+ If you use this model in your research or clinical work, please cite:
 
 
 
 
 
302
 
303
  ```bibtex
304
  @software{ohca_classifier_v3_trained,
305
+ title={OHCA Classifier v3.0: Trained BERT Model for Cardiac Arrest Detection in Medical Text},
306
  author={Mona Moukaddem},
307
  year={2025},
308
  url={https://huggingface.co/monajm36/ohca-classifier-v3-trained},
309
+ note={High-sensitivity BERT classifier for out-of-hospital cardiac arrest detection in discharge notes}
310
  }
311
  ```
312
 
313
+ ## License
314
+
315
+ This model is released under the MIT License. See LICENSE file for details.
316
+
317
+ ## Contact and Support
318
+
319
+ - **Repository**: [GitHub - OHCA Classifier v3.0](https://github.com/monajm36/ohca-classifier-3.0)
320
+ - **Issues**: Please report issues on the GitHub repository
321
+ - **Model Card**: This model card follows the framework proposed by Mitchell et al. (2019)
322
 
323
+ ## Acknowledgments
324
 
325
+ - **Base Model**: Microsoft Research for PubMedBERT
326
+ - **Dataset**: MIMIC-III for training data foundation
327
+ - **Framework**: Hugging Face Transformers library
328
+ - **Medical Domain**: Clinical expertise in cardiac arrest detection
329
+ - **Methodology**: Data science community for best practices in medical ML
330
 
331
+ This model is intended for research and clinical decision support. Always consult with medical professionals for patient care decisions.