File size: 4,367 Bytes
34da60e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
language: en
license: mit
tags:
- medical
- clinical-notes
- cardiac-arrest
- ohca
- biomedical-nlp
- transformers
- pubmedbert
library_name: transformers
pipeline_tag: text-classification
---

# OHCA Classifier V11: Temporal + Location-Aware Model

## Model Description

A transformer-based deep learning model for automatically identifying Out-of-Hospital Cardiac Arrest (OHCA) cases from clinical notes.

**Key Innovation:** Combines semantic understanding (PubMedBERT) with explicit location and temporal features to distinguish OHCA from in-hospital cardiac arrest (IHCA).

## Training Data

- **Dataset**: MIMIC-III clinical notes
- **Size**: 330 notes (47 OHCA, 283 Non-OHCA)
- **Split**: 70% train / 15% validation / 15% test
- **Average note length**: 13,042 characters

## Performance (C19 Validation - 647 notes)

| Metric | Score |
|--------|-------|
| **Sensitivity** | 92.1% |
| **Specificity** | 89.4% |
| **Precision** | 79.9% |
| **F1-Score** | 0.856 |
| **AUC-ROC** | 0.956 |

## Model Architecture

**Base Model**: `microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract`

**Input Features (775 dimensions):**
- BERT embeddings: 768
- Location features: 2
  - OHCA location indicator count (22 phrases)
  - IHCA location indicator count (25 phrases)
- Temporal features: 5
  - Arrest timing score (when arrest occurred)
  - First location outside hospital (binary)
  - First location inside hospital (binary)
  - Movement outside→inside count
  - Movement inside→inside count

**Classifier**: 3-layer MLP (775 → 512 → 256 → 2)

## Key Features

### Location Features
**OHCA indicators**: home, EMS, scene, field, bystander, ambulance, paramedics, etc.

**IHCA indicators**: floor, ICU, ward, room, bed, code blue, admitted, telemetry, etc.

### Temporal Features
Captures the **story** of what happened:
- **When**: Before arrival vs during hospitalization
- **Where it started**: First location mentioned (inside/outside)
- **How patient moved**: Direction of transitions (outside→inside vs inside→inside)

## Usage
```python
# Note: Requires custom model class and feature extraction
# See model files for implementation details

from transformers import AutoTokenizer
import torch

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("monajm36/ohca-classifier-v11")

# Example clinical note
note = """
Patient found unresponsive at home by family. 911 called.
EMS arrived, initiated CPR. ROSC achieved in field.
Transported to ED.
"""

# Extract features (requires custom code)
# location_features = extract_location_features(note)
# temporal_features = extract_temporal_features(note)

# Tokenize
inputs = tokenizer(note, return_tensors="pt", max_length=512, truncation=True)

# Predict (requires loading custom model architecture)
# ...
```

## Threshold Selection

Choose threshold based on your clinical use case:

| Use Case | Threshold | Sensitivity | Specificity | F1 |
|----------|-----------|-------------|-------------|-----|
| **Screening (High Recall)** | 0.14 | 92.1% | 89.4% | 0.856 |
| **Balanced** | 0.74 | 82.3% | 93.2% | 0.831 |
| **Research (High Precision)** | 0.85 | 75.4% | 95.0% | 0.810 |

## Limitations

- Trained on single institution (MIMIC-III)
- May not generalize to all clinical documentation styles
- IHCA false positive rate: ~28.5% at optimal threshold
- Requires feature extraction code (not included in model weights)
- Best performance on notes with clear EMS or location context

## Model Versions

This is **Version 11** - the latest and most accurate version.

| Version | Key Features | F1-Score |
|---------|--------------|----------|
| V9 | BERT only | 0.732 |
| V10 | + Location features | 0.814 |
| **V11** | **+ Temporal features** | **0.856** |

## Citation
```bibtex
@misc{moukaddem2025ohca,
  author = {Moukaddem, Mona},
  title = {OHCA Classifier V11: Temporal and Location-Aware Model for Out-of-Hospital Cardiac Arrest Identification},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/monajm36/ohca-classifier-v11}}
}
```

## Contact

For questions, issues, or collaboration opportunities, please open an issue on the model repository.

## Model Card Authors

Mona Moukaddem

## Acknowledgments

- Training data: MIMIC-III Clinical Database
- Validation data: UChicago C19 dataset
- Base model: Microsoft BiomedNLP-PubMedBERT