File size: 4,367 Bytes

34da60e

---
language: en
license: mit
tags:
- medical
- clinical-notes
- cardiac-arrest
- ohca
- biomedical-nlp
- transformers
- pubmedbert
library_name: transformers
pipeline_tag: text-classification
---

# OHCA Classifier V11: Temporal + Location-Aware Model

## Model Description

A transformer-based deep learning model for automatically identifying Out-of-Hospital Cardiac Arrest (OHCA) cases from clinical notes.

**Key Innovation:** Combines semantic understanding (PubMedBERT) with explicit location and temporal features to distinguish OHCA from in-hospital cardiac arrest (IHCA).

## Training Data

- **Dataset**: MIMIC-III clinical notes
- **Size**: 330 notes (47 OHCA, 283 Non-OHCA)
- **Split**: 70% train / 15% validation / 15% test
- **Average note length**: 13,042 characters

## Performance (C19 Validation - 647 notes)

| Metric | Score |
|--------|-------|
| **Sensitivity** | 92.1% |
| **Specificity** | 89.4% |
| **Precision** | 79.9% |
| **F1-Score** | 0.856 |
| **AUC-ROC** | 0.956 |

## Model Architecture

**Base Model**: `microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract`

**Input Features (775 dimensions):**
- BERT embeddings: 768
- Location features: 2
  - OHCA location indicator count (22 phrases)
  - IHCA location indicator count (25 phrases)
- Temporal features: 5
  - Arrest timing score (when arrest occurred)
  - First location outside hospital (binary)
  - First location inside hospital (binary)
  - Movement outside→inside count
  - Movement inside→inside count

**Classifier**: 3-layer MLP (775 → 512 → 256 → 2)

## Key Features

### Location Features
**OHCA indicators**: home, EMS, scene, field, bystander, ambulance, paramedics, etc.

**IHCA indicators**: floor, ICU, ward, room, bed, code blue, admitted, telemetry, etc.

### Temporal Features
Captures the **story** of what happened:
- **When**: Before arrival vs during hospitalization
- **Where it started**: First location mentioned (inside/outside)
- **How patient moved**: Direction of transitions (outside→inside vs inside→inside)

## Usage
```python
# Note: Requires custom model class and feature extraction
# See model files for implementation details

from transformers import AutoTokenizer
import torch

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("monajm36/ohca-classifier-v11")

# Example clinical note
note = """
Patient found unresponsive at home by family. 911 called.
EMS arrived, initiated CPR. ROSC achieved in field.
Transported to ED.
"""

# Extract features (requires custom code)
# location_features = extract_location_features(note)
# temporal_features = extract_temporal_features(note)

# Tokenize
inputs = tokenizer(note, return_tensors="pt", max_length=512, truncation=True)

# Predict (requires loading custom model architecture)
# ...
```

## Threshold Selection

Choose threshold based on your clinical use case:

| Use Case | Threshold | Sensitivity | Specificity | F1 |
|----------|-----------|-------------|-------------|-----|
| **Screening (High Recall)** | 0.14 | 92.1% | 89.4% | 0.856 |
| **Balanced** | 0.74 | 82.3% | 93.2% | 0.831 |
| **Research (High Precision)** | 0.85 | 75.4% | 95.0% | 0.810 |

## Limitations

- Trained on single institution (MIMIC-III)
- May not generalize to all clinical documentation styles
- IHCA false positive rate: ~28.5% at optimal threshold
- Requires feature extraction code (not included in model weights)
- Best performance on notes with clear EMS or location context

## Model Versions

This is **Version 11** - the latest and most accurate version.

| Version | Key Features | F1-Score |
|---------|--------------|----------|
| V9 | BERT only | 0.732 |
| V10 | + Location features | 0.814 |
| **V11** | **+ Temporal features** | **0.856** |

## Citation
```bibtex
@misc{moukaddem2025ohca,
  author = {Moukaddem, Mona},
  title = {OHCA Classifier V11: Temporal and Location-Aware Model for Out-of-Hospital Cardiac Arrest Identification},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/monajm36/ohca-classifier-v11}}
}
```

## Contact

For questions, issues, or collaboration opportunities, please open an issue on the model repository.

## Model Card Authors

Mona Moukaddem

## Acknowledgments

- Training data: MIMIC-III Clinical Database
- Validation data: UChicago C19 dataset
- Base model: Microsoft BiomedNLP-PubMedBERT