File size: 11,528 Bytes
ba11db9
adaa6be
ba11db9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
adaa6be
ba11db9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
adaa6be
 
 
 
 
ba11db9
adaa6be
 
 
 
ba11db9
 
 
 
 
 
 
adaa6be
 
 
 
 
 
 
ba11db9
a3b9749
ba11db9
a3b9749
ba11db9
a3b9749
ba11db9
 
 
a3b9749
ba11db9
 
 
adaa6be
 
ba11db9
 
 
 
 
adaa6be
 
ba11db9
 
 
adaa6be
ba11db9
adaa6be
 
 
ba11db9
a3b9749
 
ba11db9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
adaa6be
 
ba11db9
adaa6be
 
 
 
ba11db9
 
a3b9749
ba11db9
 
 
adaa6be
 
ba11db9
adaa6be
ba11db9
adaa6be
 
 
 
ba11db9
 
 
 
 
 
 
 
 
 
 
 
adaa6be
 
ba11db9
a3b9749
ba11db9
 
 
 
 
 
 
 
 
a3b9749
ba11db9
 
 
a3b9749
adaa6be
ba11db9
adaa6be
ba11db9
adaa6be
ba11db9
adaa6be
ba11db9
 
 
 
 
 
adaa6be
ba11db9
adaa6be
ba11db9
 
 
 
adaa6be
ba11db9
 
 
 
a3b9749
ba11db9
a3b9749
ba11db9
 
 
 
 
a3b9749
ba11db9
a3b9749
ba11db9
 
 
 
 
 
 
 
a3b9749
ba11db9
a3b9749
ba11db9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
516b1c5
ba11db9
a3b9749
ba11db9
 
 
 
 
a3b9749
ba11db9
a3b9749
ba11db9
 
 
 
adaa6be
 
 
ba11db9
 
 
 
 
adaa6be
ba11db9
adaa6be
ba11db9
adaa6be
ba11db9
 
 
 
adaa6be
ba11db9
a3b9749
ba11db9
 
adaa6be
 
ba11db9
 
 
 
 
 
adaa6be
ba11db9
adaa6be
ba11db9
adaa6be
 
 
ba11db9
adaa6be
 
 
ba11db9
adaa6be
 
 
ba11db9
 
 
 
 
 
 
 
 
516b1c5
ba11db9
516b1c5
ba11db9
 
 
 
 
516b1c5
ba11db9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
# OHCA Classifier v3.0 - Trained Model

## Model Description

This is a trained BERT-based classifier for detecting Out-of-Hospital Cardiac Arrest (OHCA) cases in medical discharge notes. The model is fine-tuned from PubMedBERT and achieves high sensitivity for OHCA detection with configurable thresholds for different clinical needs.

## Model Details

- **Model Name**: OHCA Classifier v3.0 - Trained
- **Base Model**: microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract
- **Task**: Binary text classification (OHCA vs Non-OHCA)
- **Language**: English
- **Domain**: Medical/Clinical text
- **Model Version**: 3.0
- **Author**: Mona Moukaddem
- **Model Size**: 109M parameters
- **License**: MIT

## Performance Metrics

| Metric | Value | Description |
|---|---|---|
| Optimal Threshold | 0.996 | Found via validation set optimization |
| F1-Score | 0.632 | Harmonic mean of precision and recall |
| Sensitivity (Recall) | 1.000 | 100% - Catches all OHCA cases at optimal threshold |
| Specificity | 0.741 | 74.1% - Correctly identifies non-OHCA cases |
| AUC-ROC | High | Excellent discrimination ability |

## Threshold Selection Guide

**For Clinical Screening (Recommended): 0.90**
- Good balance of sensitivity and specificity
- Reduces false positives while maintaining high sensitivity
- Suitable for most clinical workflows and screening applications

**For Ultra-Conservative Screening: 0.996**
- Optimal threshold from validation set optimization
- Maximizes sensitivity (100%) 
- May produce more false positives in some populations
- Use when missing OHCA cases is extremely costly

**For Research/Validation: Variable**
- Adjust based on your specific requirements
- Consider your population's OHCA prevalence
- Validate performance on your own dataset

## Training Data

| Dataset Characteristic | Value |
|---|---|
| Total Cases | 330 |
| OHCA Cases | 59 (17.9%) |
| Non-OHCA Cases | 271 (82.1%) |
| Training Split | 264 cases |
| Validation Split | 66 cases |
| Data Source | MIMIC-III derived discharge notes |

## Usage

### Quick Start

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the model
model_name = "monajm36/ohca-classifier-v3-trained"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Threshold options
recommended_threshold = 0.90  # Recommended for clinical screening
optimal_threshold = 0.996     # From validation set optimization

def predict_ohca(text, threshold=0.90):
    """Predict OHCA from medical text"""
    inputs = tokenizer(text, truncation=True, padding=True,
                      max_length=512, return_tensors="pt")
    
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)
        ohca_prob = probs[0][1].item()
    
    prediction = 1 if ohca_prob >= threshold else 0
    
    # Clinical priority based on probability
    if ohca_prob >= 0.996:
        priority = "Immediate Review"
    elif ohca_prob >= 0.90:
        priority = "Priority Review"  
    elif ohca_prob >= 0.70:
        priority = "Consider Review"
    else:
        priority = "Routine"
        
    confidence = "High" if ohca_prob >= 0.90 else "Medium" if ohca_prob >= 0.50 else "Low"
    
    return {
        "prediction": "OHCA" if prediction == 1 else "Non-OHCA",
        "probability": ohca_prob,
        "confidence": confidence,
        "clinical_priority": priority,
        "threshold_used": threshold
    }

# Example usage
text = "Patient presents with cardiac arrest at home, found down by family"
result = predict_ohca(text)  # Uses recommended 0.90 threshold
print(f"Prediction: {result['prediction']}")
print(f"Probability: {result['probability']:.3f}")
print(f"Clinical Priority: {result['clinical_priority']}")
```

### Pipeline Usage

```python
from transformers import pipeline

# Create classification pipeline
classifier = pipeline("text-classification", model="monajm36/ohca-classifier-v3-trained")

# Classify medical text
text = "Patient presents with cardiac arrest at home"
result = classifier(text)
print(result)
# Output: [{'label': 'LABEL_1', 'score': 0.998}]
# LABEL_0 = Non-OHCA, LABEL_1 = OHCA

# For clinical use, apply appropriate threshold:
probability = result[0]['score'] if result[0]['label'] == 'LABEL_1' else 1 - result[0]['score']
is_ohca_90 = probability >= 0.90    # Recommended threshold
is_ohca_996 = probability >= 0.996  # Optimal threshold
```

### Batch Processing

```python
import pandas as pd

def process_medical_notes(df, text_column='clean_text', threshold=0.90):
    """Process multiple medical notes"""
    results = []
    
    for text in df[text_column]:
        result = predict_ohca(text, threshold=threshold)
        results.append(result)
    
    # Add results to dataframe
    df['ohca_prediction'] = [r['prediction'] for r in results]
    df['ohca_probability'] = [r['probability'] for r in results]
    df['clinical_priority'] = [r['clinical_priority'] for r in results]
    
    return df

# Example with DataFrame
medical_notes = pd.DataFrame({
    'patient_id': [1, 2, 3],
    'clean_text': [
        "Patient found in cardiac arrest at home by spouse",
        "Patient complains of chest pain, vital signs stable",
        "Witnessed cardiac arrest in emergency department"
    ]
})

results = process_medical_notes(medical_notes)
print(results[['patient_id', 'ohca_prediction', 'ohca_probability']])
```

### Compare Different Thresholds

```python
def compare_thresholds(text):
    """Compare predictions at different thresholds"""
    thresholds = [0.50, 0.70, 0.90, 0.996]
    
    for threshold in thresholds:
        result = predict_ohca(text, threshold=threshold)
        print(f"Threshold {threshold}: {result['prediction']} "
              f"(p={result['probability']:.3f}, priority={result['clinical_priority']})")

# Example comparison
text = "Patient found down at home, family performed CPR"
compare_thresholds(text)
```

## Clinical Decision Support

The model provides configurable sensitivity for OHCA detection, making it suitable for clinical screening where different thresholds may be appropriate based on clinical context and cost of missed cases.

### Clinical Workflow Integration

| Probability Range | Clinical Priority | Recommended Action |
|---|---|---|
| β‰₯ 0.996 | πŸ”΄ Immediate Review | Very high confidence - Urgent review required |
| 0.90 - 0.995 | 🟑 Priority Review | High confidence - Clinical team review |
| 0.70 - 0.89 | 🟠 Consider Review | Moderate confidence - Consider for review |
| < 0.70 | 🟒 Routine | Low probability - Standard processing |

### Threshold Selection for Clinical Use

**Use 0.90 threshold when:**
- Screening large volumes of discharge notes
- Balancing sensitivity with manageable false positive rates
- Implementing in routine clinical workflows

**Use 0.996 threshold when:**
- Ultra-high sensitivity is required
- Cost of missing OHCA cases is extremely high
- You have resources to review more false positives

## Quality Assurance

- **High Sensitivity**: Configurable thresholds ensure no OHCA cases are missed
- **Optimal Threshold**: 0.996 maximizes sensitivity on validation data
- **Clinical Threshold**: 0.90 provides practical balance for screening
- **Patient-Level Training**: Prevents data leakage and overfitting
- **Clinical Validation**: Designed for real-world medical text processing

## Model Architecture

```
PubMedBERT (microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract)
β”œβ”€β”€ 12 Transformer layers
β”œβ”€β”€ 768 hidden dimensions  
β”œβ”€β”€ 12 attention heads
β”œβ”€β”€ 109M parameters
└── Classification head (2 classes: OHCA vs Non-OHCA)
```

## Training Details

| Training Parameter | Value |
|---|---|
| Framework | PyTorch + Transformers |
| Optimizer | AdamW |
| Learning Rate | Default (with linear scheduling) |
| Epochs | 3 |
| Batch Size | 8 (with gradient accumulation) |
| Max Sequence Length | 512 tokens |
| Class Balancing | Weighted loss + minority oversampling |
| Validation Strategy | Patient-level splits (prevents data leakage) |
| Hardware | CPU training |

## Evaluation Strategy

- **Patient-Level Data Splits**: Ensures all notes from the same patient stay in one split
- **Optimal Threshold Finding**: Uses validation set to find best decision threshold
- **Independent Test Set**: Unbiased evaluation on held-out data
- **Clinical Metrics**: Focus on sensitivity for medical screening applications

## Limitations and Considerations

### Limitations

- Trained on specific medical text format (discharge notes)
- May not generalize to different hospital systems without fine-tuning
- Performance may vary with different patient populations
- Designed specifically for English medical text
- Limited to text-based OHCA detection (no multimodal inputs)

### Ethical Considerations

- **Clinical Use**: This model is intended to assist, not replace, clinical judgment
- **Bias Monitoring**: Regular evaluation across different patient demographics recommended
- **Human Oversight**: All high-probability predictions should be reviewed by medical professionals
- **Privacy**: Ensure compliance with healthcare data regulations (HIPAA, etc.)

### Performance Variations

Model performance may vary across different:
- Hospital systems and documentation styles
- Patient demographics and populations
- Types of cardiac arrest presentations
- Clinical documentation quality and completeness

## Related Work

This model is based on the OHCA Classifier v3.0 methodology with significant improvements over previous versions:

- **Enhanced Methodology**: Patient-level splits, optimal threshold finding
- **Source Code**: Available at [monajm36/ohca-classifier-3.0](https://github.com/monajm36/ohca-classifier-3.0)
- **Training Pipeline**: Complete v3.0 training workflow for custom model development
- **Research Foundation**: Built on established medical NLP and machine learning best practices

## Installation and Dependencies

```bash
pip install transformers torch pandas numpy
```

**Minimum Requirements:**
- Python 3.8+
- PyTorch 1.9+
- Transformers 4.20+
- 4GB RAM for inference
- GPU optional (model works on CPU)

## Citation

If you use this model in your research or clinical work, please cite:

```bibtex
@software{ohca_classifier_v3_trained,
  title={OHCA Classifier v3.0: Trained BERT Model for Cardiac Arrest Detection in Medical Text},
  author={Mona Moukaddem},
  year={2025},
  url={https://huggingface.co/monajm36/ohca-classifier-v3-trained},
  note={High-sensitivity BERT classifier for out-of-hospital cardiac arrest detection in discharge notes}
}
```

## License

This model is released under the MIT License. See LICENSE file for details.

## Contact and Support

- **Repository**: [GitHub - OHCA Classifier v3.0](https://github.com/monajm36/ohca-classifier-3.0)
- **Issues**: Please report issues on the GitHub repository
- **Model Card**: This model card follows the framework proposed by Mitchell et al. (2019)

## Acknowledgments

- **Base Model**: Microsoft Research for PubMedBERT
- **Dataset**: MIMIC-III for training data foundation
- **Framework**: Hugging Face Transformers library
- **Medical Domain**: Clinical expertise in cardiac arrest detection
- **Methodology**: Data science community for best practices in medical ML

This model is intended for research and clinical decision support. Always consult with medical professionals for patient care decisions.