Hospital-in-the-Home (HITH) Eligibility Classifier
A fine-tuned Bio_ClinicalBERT model that predicts whether a patient is suitable for Hospital-in-the-Home care based on clinical features.
Model Description
This model uses a pseudo-note fusion approach (inspired by the MEME paper) to combine structured and unstructured clinical data into a single text input for classification:
- Presenting complaint โ text
- Ward โ text
- Length of stay โ numeric, serialized to text
- Primary diagnosis โ text
- Nurse handover notes โ free text
- Allied health notes โ free text
All features are serialized into a structured pseudo-note format and classified using Bio_ClinicalBERT with a classification head.
Input Format
The model expects a single text string in pseudo-note format:
PRESENTING COMPLAINT: [complaint]. WARD: [ward]. LENGTH OF STAY: [days] days. PRIMARY DIAGNOSIS: [diagnosis]. NURSE HANDOVER: [nurse notes] ALLIED HEALTH: [allied health notes]
Performance (Synthetic Data)
| Metric | Score |
|---|---|
| AUROC | 1.000 |
| AUPRC | 1.000 |
| F1 | 1.000 |
โ ๏ธ Note: These results are on synthetic training data with clear class separation. Real-world performance will differ and the model should be retrained on actual clinical data before any clinical use.
Training Details
- Base model: emilyalsentzer/Bio_ClinicalBERT (110M params)
- Training data: 2,000 synthetic clinical records
- Train/Val/Test split: 70/15/15
- Max sequence length: 512 tokens
- Learning rate: 2e-5
- Batch size: 16
- Epochs: 4 (early stopping, patience=3)
- Optimizer: AdamW with weight decay 0.01
- Hardware: NVIDIA T4 GPU
Intended Use
This is a prototype/research model for exploring HITH eligibility prediction. It is trained on synthetic data and is NOT suitable for clinical decision-making without:
- Retraining on real clinical data
- Extensive clinical validation
- Regulatory approval
- Integration with clinical governance frameworks
Key Features (SHAP Analysis from XGBoost baseline)
The most predictive features identified by SHAP analysis:
- Pro-HITH: "stable", "improving", "drinking", "mobilising", "independent", "cellulitis"
- Anti-HITH: "cardiac monitoring", "ICU", "ventilated", "sedated", "high acuity", "bed rest"
Two-Model System
This repository contains the ClinicalBERT model. A companion XGBoost + TF-IDF baseline model is also available, providing:
- Faster inference
- Better interpretability via SHAP feature importance
- Comparable performance on this task
Citation
If you use this model, please cite:
- Bio_ClinicalBERT: Alsentzer et al. (2019) "Publicly Available Clinical BERT Embeddings"
- MEME approach: Yang et al. (2024) "MEME: Machine Learning Enhanced Medical Embeddings for ED Disposition Prediction"
- Downloads last month
- 52
Model tree for podfastio/hith-eligibility-clinicalbert
Base model
emilyalsentzer/Bio_ClinicalBERT