🧠 Parkinson's Disease Progression Prediction Model
This repository contains machine learning models designed to predict:
- Parkinson’s disease severity (Early / Moderate / Severe)
- UPDRS progression (Regression)
based on biomedical voice features from the UCI Parkinson’s dataset.
🔍 Project Overview
Parkinson’s Disease (PD) causes progressive degradation of motor control. Voice changes are among the earliest and most accessible biomarkers.
This project includes:
✔️ Regression Models (UPDRS Prediction)
- Random Forest Regressor
- XGBoost Regressor
- RF + XGBoost Ensemble (Best Model)
- R² ≈ 0.998
- MAE ≈ 0.17
✔️ Classification Models (Severity Levels)
Predicts:
- Early PD
- Moderate PD
- Severe PD
Models:
- Random Forest Classifier
- XGBoost Classifier (100% accuracy)
📂 Files in This Repository
| File | Description |
|---|---|
rf_classification_model.pkl |
Random Forest Classifier |
xgb_classification_model.pkl |
XGBoost Classifier |
ensemble_model.pkl |
RF + XGBoost Ensemble for UPDRS |
README.md |
Model card |
🎯 How to Use This Model
1️⃣ Install Dependencies
pip install numpy pandas scikit-learn xgboost joblib
2️⃣ Load the Models
import joblib
# Classification models
rf_cls = joblib.load("rf_classification_model.pkl")
xgb_cls = joblib.load("xgb_classification_model.pkl")
# Regression ensemble
ensemble = joblib.load("ensemble_model.pkl")
3️⃣ Make Predictions
🩺 Severity Classification
import numpy as np
sample = np.array([[
65, # age
1, # sex (1 = male, 0 = female)
0.45, # jitter
0.32, # shimmer
15.4, # HNR
0.65, # DFA
0.45 # RPDE
]])
prediction = xgb_cls.predict(sample)
print("Predicted PD severity:", prediction)
📉 UPDRS Regression Prediction
updrs_prediction = ensemble.predict(sample)
print("Predicted UPDRS score:", updrs_prediction)
📊 Model Performance
Regression (UPDRS Prediction)
| Model | MAE | RMSE | R² |
| --------------------- | --------- | --------- | --------- |
| Random Forest | 0.177 | 0.363 | 0.9979 |
| XGBoost | 0.854 | 1.209 | 0.977 |
| **RF + XGB Ensemble** | **0.172** | **0.349** | **0.998** |
Classification (Severity Prediction)
Random Forest: 98.1%
XGBoost: 100% accuracy
🧬 Biological Relevance of Key Features
| Feature | Meaning | PD Relevance |
| -------------------- | ----------------------- | --------------------------- |
| **Age** | Risk increases with age | PD severity rises |
| **Sex** | Males progress faster | Hormonal + genetic factors |
| **DFA** | Vibration irregularity | Higher in advanced PD |
| **RPDE** | Speech unpredictability | More chaos → severe PD |
| **Jitter / Shimmer** | Micro-tremors in voice | Motor instability |
| **HNR** | Noise-to-harmonics | Lower in moderate/severe PD |
These voice biomarkers reflect PD-related neuromotor degradation.
📚 Dataset
UCI Parkinson’s Telemonitoring Dataset
Modified into:
Multi-class severity labels
Regression target (UPDRS)
📝 Citation
If you use this model:
Kaur, H. (2025). Parkinson’s Disease Progression Prediction Model.
👤 Author
Harjot Kaur
Computational Biology Researcher
This model predicts Parkinson’s disease severity (Early, Moderate, Severe) and UPDRS motor progression using biomedical voice features from the UCI Parkinson’s Telemonitoring dataset.
It contains:
• Random Forest Classification Model
• XGBoost Classification Model
• RF + XGBoost Ensemble Regression Model (best performance)
The model extracts clinically meaningful vocal biomarkers such as jitter, shimmer, HNR, RPDE, DFA, and pitch instability. These features correlate strongly with motor dysfunction and disease progression.
🩺 Clinical Relevance
Vocal fold instability increases with advancing PD. Measures like RPDE and DFA capture irregular vocal vibration, while jitter/shimmer reflect micro-tremors. These indicators help detect early progression patterns.
📊 Performance
• Classification accuracy: XGBoost = 100%
• Regression R² score (Ensemble): 0.998
• Extremely low MAE and RMSE values
🎯 Use Cases
• Remote PD screening
• Progression monitoring
• Assistive clinical decision support
• Biomedical ML research & benchmarking
This repository includes all trained models (.pkl files) and documentation for loading, inference, and integration into applications.