--- language: en license: mit tags: - medical - clinical-notes - cardiac-arrest - ohca - biomedical-nlp - transformers - pubmedbert library_name: transformers pipeline_tag: text-classification --- # OHCA Classifier V11: Temporal + Location-Aware Model ## Model Description A transformer-based deep learning model for automatically identifying Out-of-Hospital Cardiac Arrest (OHCA) cases from clinical notes. **Key Innovation:** Combines semantic understanding (PubMedBERT) with explicit location and temporal features to distinguish OHCA from in-hospital cardiac arrest (IHCA). ## Training Data - **Dataset**: MIMIC-III clinical notes - **Size**: 330 notes (47 OHCA, 283 Non-OHCA) - **Split**: 70% train / 15% validation / 15% test - **Average note length**: 13,042 characters ## Performance (C19 Validation - 647 notes) | Metric | Score | |--------|-------| | **Sensitivity** | 92.1% | | **Specificity** | 89.4% | | **Precision** | 79.9% | | **F1-Score** | 0.856 | | **AUC-ROC** | 0.956 | ## Model Architecture **Base Model**: `microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract` **Input Features (775 dimensions):** - BERT embeddings: 768 - Location features: 2 - OHCA location indicator count (22 phrases) - IHCA location indicator count (25 phrases) - Temporal features: 5 - Arrest timing score (when arrest occurred) - First location outside hospital (binary) - First location inside hospital (binary) - Movement outside→inside count - Movement inside→inside count **Classifier**: 3-layer MLP (775 → 512 → 256 → 2) ## Key Features ### Location Features **OHCA indicators**: home, EMS, scene, field, bystander, ambulance, paramedics, etc. **IHCA indicators**: floor, ICU, ward, room, bed, code blue, admitted, telemetry, etc. ### Temporal Features Captures the **story** of what happened: - **When**: Before arrival vs during hospitalization - **Where it started**: First location mentioned (inside/outside) - **How patient moved**: Direction of transitions (outside→inside vs inside→inside) ## Usage ```python # Note: Requires custom model class and feature extraction # See model files for implementation details from transformers import AutoTokenizer import torch # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("monajm36/ohca-classifier-v11") # Example clinical note note = """ Patient found unresponsive at home by family. 911 called. EMS arrived, initiated CPR. ROSC achieved in field. Transported to ED. """ # Extract features (requires custom code) # location_features = extract_location_features(note) # temporal_features = extract_temporal_features(note) # Tokenize inputs = tokenizer(note, return_tensors="pt", max_length=512, truncation=True) # Predict (requires loading custom model architecture) # ... ``` ## Threshold Selection Choose threshold based on your clinical use case: | Use Case | Threshold | Sensitivity | Specificity | F1 | |----------|-----------|-------------|-------------|-----| | **Screening (High Recall)** | 0.14 | 92.1% | 89.4% | 0.856 | | **Balanced** | 0.74 | 82.3% | 93.2% | 0.831 | | **Research (High Precision)** | 0.85 | 75.4% | 95.0% | 0.810 | ## Limitations - Trained on single institution (MIMIC-III) - May not generalize to all clinical documentation styles - IHCA false positive rate: ~28.5% at optimal threshold - Requires feature extraction code (not included in model weights) - Best performance on notes with clear EMS or location context ## Model Versions This is **Version 11** - the latest and most accurate version. | Version | Key Features | F1-Score | |---------|--------------|----------| | V9 | BERT only | 0.732 | | V10 | + Location features | 0.814 | | **V11** | **+ Temporal features** | **0.856** | ## Citation ```bibtex @misc{moukaddem2025ohca, author = {Moukaddem, Mona}, title = {OHCA Classifier V11: Temporal and Location-Aware Model for Out-of-Hospital Cardiac Arrest Identification}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/monajm36/ohca-classifier-v11}} } ``` ## Contact For questions, issues, or collaboration opportunities, please open an issue on the model repository. ## Model Card Authors Mona Moukaddem ## Acknowledgments - Training data: MIMIC-III Clinical Database - Validation data: UChicago C19 dataset - Base model: Microsoft BiomedNLP-PubMedBERT