monajm36 commited on
Commit
34da60e
·
verified ·
1 Parent(s): abec363

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +152 -0
README.md ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - medical
6
+ - clinical-notes
7
+ - cardiac-arrest
8
+ - ohca
9
+ - biomedical-nlp
10
+ - transformers
11
+ - pubmedbert
12
+ library_name: transformers
13
+ pipeline_tag: text-classification
14
+ ---
15
+
16
+ # OHCA Classifier V11: Temporal + Location-Aware Model
17
+
18
+ ## Model Description
19
+
20
+ A transformer-based deep learning model for automatically identifying Out-of-Hospital Cardiac Arrest (OHCA) cases from clinical notes.
21
+
22
+ **Key Innovation:** Combines semantic understanding (PubMedBERT) with explicit location and temporal features to distinguish OHCA from in-hospital cardiac arrest (IHCA).
23
+
24
+ ## Training Data
25
+
26
+ - **Dataset**: MIMIC-III clinical notes
27
+ - **Size**: 330 notes (47 OHCA, 283 Non-OHCA)
28
+ - **Split**: 70% train / 15% validation / 15% test
29
+ - **Average note length**: 13,042 characters
30
+
31
+ ## Performance (C19 Validation - 647 notes)
32
+
33
+ | Metric | Score |
34
+ |--------|-------|
35
+ | **Sensitivity** | 92.1% |
36
+ | **Specificity** | 89.4% |
37
+ | **Precision** | 79.9% |
38
+ | **F1-Score** | 0.856 |
39
+ | **AUC-ROC** | 0.956 |
40
+
41
+ ## Model Architecture
42
+
43
+ **Base Model**: `microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract`
44
+
45
+ **Input Features (775 dimensions):**
46
+ - BERT embeddings: 768
47
+ - Location features: 2
48
+ - OHCA location indicator count (22 phrases)
49
+ - IHCA location indicator count (25 phrases)
50
+ - Temporal features: 5
51
+ - Arrest timing score (when arrest occurred)
52
+ - First location outside hospital (binary)
53
+ - First location inside hospital (binary)
54
+ - Movement outside→inside count
55
+ - Movement inside→inside count
56
+
57
+ **Classifier**: 3-layer MLP (775 → 512 → 256 → 2)
58
+
59
+ ## Key Features
60
+
61
+ ### Location Features
62
+ **OHCA indicators**: home, EMS, scene, field, bystander, ambulance, paramedics, etc.
63
+
64
+ **IHCA indicators**: floor, ICU, ward, room, bed, code blue, admitted, telemetry, etc.
65
+
66
+ ### Temporal Features
67
+ Captures the **story** of what happened:
68
+ - **When**: Before arrival vs during hospitalization
69
+ - **Where it started**: First location mentioned (inside/outside)
70
+ - **How patient moved**: Direction of transitions (outside→inside vs inside→inside)
71
+
72
+ ## Usage
73
+ ```python
74
+ # Note: Requires custom model class and feature extraction
75
+ # See model files for implementation details
76
+
77
+ from transformers import AutoTokenizer
78
+ import torch
79
+
80
+ # Load tokenizer
81
+ tokenizer = AutoTokenizer.from_pretrained("monajm36/ohca-classifier-v11")
82
+
83
+ # Example clinical note
84
+ note = """
85
+ Patient found unresponsive at home by family. 911 called.
86
+ EMS arrived, initiated CPR. ROSC achieved in field.
87
+ Transported to ED.
88
+ """
89
+
90
+ # Extract features (requires custom code)
91
+ # location_features = extract_location_features(note)
92
+ # temporal_features = extract_temporal_features(note)
93
+
94
+ # Tokenize
95
+ inputs = tokenizer(note, return_tensors="pt", max_length=512, truncation=True)
96
+
97
+ # Predict (requires loading custom model architecture)
98
+ # ...
99
+ ```
100
+
101
+ ## Threshold Selection
102
+
103
+ Choose threshold based on your clinical use case:
104
+
105
+ | Use Case | Threshold | Sensitivity | Specificity | F1 |
106
+ |----------|-----------|-------------|-------------|-----|
107
+ | **Screening (High Recall)** | 0.14 | 92.1% | 89.4% | 0.856 |
108
+ | **Balanced** | 0.74 | 82.3% | 93.2% | 0.831 |
109
+ | **Research (High Precision)** | 0.85 | 75.4% | 95.0% | 0.810 |
110
+
111
+ ## Limitations
112
+
113
+ - Trained on single institution (MIMIC-III)
114
+ - May not generalize to all clinical documentation styles
115
+ - IHCA false positive rate: ~28.5% at optimal threshold
116
+ - Requires feature extraction code (not included in model weights)
117
+ - Best performance on notes with clear EMS or location context
118
+
119
+ ## Model Versions
120
+
121
+ This is **Version 11** - the latest and most accurate version.
122
+
123
+ | Version | Key Features | F1-Score |
124
+ |---------|--------------|----------|
125
+ | V9 | BERT only | 0.732 |
126
+ | V10 | + Location features | 0.814 |
127
+ | **V11** | **+ Temporal features** | **0.856** |
128
+
129
+ ## Citation
130
+ ```bibtex
131
+ @misc{moukaddem2025ohca,
132
+ author = {Moukaddem, Mona},
133
+ title = {OHCA Classifier V11: Temporal and Location-Aware Model for Out-of-Hospital Cardiac Arrest Identification},
134
+ year = {2025},
135
+ publisher = {Hugging Face},
136
+ howpublished = {\url{https://huggingface.co/monajm36/ohca-classifier-v11}}
137
+ }
138
+ ```
139
+
140
+ ## Contact
141
+
142
+ For questions, issues, or collaboration opportunities, please open an issue on the model repository.
143
+
144
+ ## Model Card Authors
145
+
146
+ Mona Moukaddem
147
+
148
+ ## Acknowledgments
149
+
150
+ - Training data: MIMIC-III Clinical Database
151
+ - Validation data: UChicago C19 dataset
152
+ - Base model: Microsoft BiomedNLP-PubMedBERT