Model Name: RobustHOG Version: 1.0 Date: December 2025 Author: Jorge Creiann AC Jarme
- Model Details
Type: Classical machine learning, interpretable computer vision model.
Architecture:
Input: MRI brain images (T1-weighted, grayscale, resized 128×128).
Feature extraction: Histogram of Oriented Gradients (HOG).
Dimensionality reduction: PCA (95% variance retained) + Diffusion Maps.
Classifier: Logistic Regression (elastic-net regularization, balanced class weights).
Objective: Predict dementia severity (Non-Demented, Very Mild, Mild, Moderate).
- Intended Use
Primary Use Case: Aid in early detection of Alzheimer’s disease from MRI scans.
Users: Researchers, clinicians (research purposes only, not for clinical decision-making).
Limitations:
Small dataset (OASIS), results may not generalize to other populations.
Performance sensitive to noise in MRI acquisition.
Not validated for diagnostic or treatment decisions.
- Training Data
Dataset: OASIS MRI dataset.
Labels: Dementia severity (0–3).
Preprocessing: Grayscale conversion, resizing to 128×128, HOG feature extraction.
Train/Test Split: Subject-level split to avoid leakage between training and test images.
- Evaluation
Metrics: Balanced Accuracy, Macro F1, Classification Error Rate.
Robustness Testing:
Domain Shift, Noise Injection, Occlusion, Mimicry, Corruption, Blur.
Noise Injection identified as most damaging.
Statistical Testing:
Bootstrap confidence intervals, Wilcoxon paired tests, Cliff’s Delta, McNemar’s test, Holm-Bonferroni correction.
Impossibility Testing: Label permutation, random features.
Causal Inference & A/B Testing: Negative control, invariance test, confounder sensitivity, bootstrapped delta tests.
Key Results:
Baseline HOG + LR: Balanced Accuracy = 0.299, Macro F1 = 0.201.
HOG + PCA + LR: Balanced Accuracy = 0.350, Macro F1 = 0.269.
Stress testing: Noise caused largest accuracy drop; other perturbations had smaller effects.
Label permutation confirmed model learns meaningful patterns.
- Ethical Considerations
Fairness: Model trained on OASIS dataset; may not generalize across demographics.
Transparency: Fully interpretable using HOG features, linear coefficients, and local linear explanations.
Responsible Use: Intended for research purposes; clinical use requires thorough validation.
- Limitations
Small sample size and high-dimensional input may cause overfitting.
Sensitive to noisy imaging conditions.
PCA + Diffusion dimensionality reduction may remove subtle biomarkers.
- Maintenance & Future Work
Extend to larger, multi-site datasets.
Explore CNN-based pipelines once computational resources allow.
Improve robustness via advanced denoising and augmentation strategies.