mm / README.md
carywoods's picture
Update README.md
0a4151b verified
---
license: gpl-3.0
language:
- en
metrics:
- accuracy
pipeline_tag: feature-extraction
tags:
- medical
---
# 🧾 Model Card: Maternal Morbidity Risk Classifiers (Indiana 2010–2020)
**Model Name:** `maternal-morbidity-classifier-suite-indiana-2010-2020`
**Version:** 1.0
**Developed by:** Dr. Cary Woods, HarnessAI
**License:** [GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html)
**Release Date:** May 2025
**Trained on:** 813,837 public birth and infant death records from Indiana (2010–2020)
---
## πŸ”¬ Research Context
These models were developed as part of the research paper:
**"Predictive Modeling of Maternal Morbidity: Insights from a Decade of Regional Birth Data (2010–2020)"**
Published on ResearchGate, May 2025.
πŸ“„ DOI: [http://dx.doi.org/10.13140/RG.2.2.26163.13608](http://dx.doi.org/10.13140/RG.2.2.26163.13608)
The study investigates the use of machine learning to predict maternal birth complications using administrative birth record data. The work focuses on model interpretability, sensitivity to rare events, and the challenges posed by missing socioeconomic indicators in public health datasets.
---
## πŸ“Œ Overview
This release includes three supervised learning classifiers trained to predict maternal morbidity from Indiana birth records: Logistic Regression, Random Forest, and Gradient Boosting. Each model uses the same engineered feature set and preprocessing pipeline. The classifiers were trained to identify rare maternal complication outcomes and are optimized for high recall.
---
## πŸ“Š Intended Use
- Triage support for public health researchers and analysts
- Educational use in public health informatics curricula
- Demonstration of risk modeling in imbalanced health datasets
- Not for direct clinical deployment without validation
---
## πŸ“ˆ Performance
| Model | Precision | Recall | F1 Score | ROC-AUC |
|---------------------|-----------|--------|----------|---------|
| Logistic Regression | 0.75 | 0.70 | 0.72 | 0.81 |
| Random Forest | 0.80 | 0.77 | 0.78 | 0.86 |
| Gradient Boosting | 0.83 | 0.80 | 0.81 | 0.89 |
Gradient Boosting showed the highest overall performance. Logistic Regression offers strong interpretability, and Random Forest provides a reliable non-linear baseline.
---
## πŸ§ͺ Limitations
- Omits social determinants like race, education, and income
- Reflects administrative data only (not clinical records)
- Binary outcome may oversimplify severity levels
- Developed on Indiana data; needs regional validation
---
## πŸ”„ Reuse & Redistribution
This model suite is released under **GPL v3**. You may reuse, modify, and redistribute it, provided derivative works maintain this license and include attribution.
---
## πŸ“¦ Files Included
- `model_lr.joblib` β€” Logistic Regression classifier
- `model_rf.joblib` β€” Random Forest classifier
- `model_gb.joblib` β€” Gradient Boosting classifier
- `test_models.py` β€” test script
- `README.md` β€” Usage guide
- `LICENSE.txt` β€” GPL v3 license
---