DiaFoot.AI v2 β Diabetic Foot Ulcer Detection & Segmentation
β οΈ IMPORTANT DISCLAIMER: This is an academic research project developed for educational purposes as part of the AAI6620 Computer Vision course at Northeastern University. This software is NOT a medical device, is NOT FDA-cleared, and is NOT intended for clinical use, diagnosis, treatment, or any medical decision-making. It does not replace professional medical judgment. Always consult a qualified healthcare provider for any medical concerns. The authors assume no liability for any use of this software in clinical or diagnostic settings.
A production-grade multi-task pipeline for automated diabetic foot ulcer (DFU) detection, wound boundary segmentation, and clinical wound assessment. Built to demonstrate modern computer vision techniques applied to medical imaging.
Clinical Motivation
Diabetic foot ulcers affect 15β25% of diabetic patients in their lifetime, with 85% of diabetes-related amputations preceded by a foot ulcer. Early detection and accurate wound measurement can reduce amputation rates by up to 85%. DiaFoot.AI explores how deep learning can automate wound boundary detection to potentially support clinical workflows in the future.
Why v2: Lessons from v1
The original DiaFoot.AI (v1) achieved 84.93% IoU and 91.73% Dice β numbers that looked strong but masked two fundamental flaws:
- Training data contained only ulcer images. The model never learned what healthy skin looks like, so it predicted ulcers on every input β zero clinical specificity. A model that calls everything a wound is clinically useless.
- No data cleaning pipeline. Raw scraped images were fed directly into training with no quality audit, duplicate detection, or label verification.
DiaFoot.AI v2 is a ground-up rebuild that fixes both problems through a multi-task cascaded pipeline and rigorous data engineering.
Architecture
The system uses a cascaded pipeline (Strategy A), validated by ablation to outperform joint multi-task training:
Input Image
β
βΌ
βββββββββββββββββββββββββββββ
β Triage Classifier β
β EfficientNet-V2-M β
β β
β β Healthy β β Stop. No wound detected.
β β Non-DFU Condition β β Stop. Not a diabetic ulcer.
β β DFU Detected β β Proceed to segmentation.
βββββββββββββ¬ββββββββββββββββ
β (DFU only)
βΌ
βββββββββββββββββββββββββββββ
β Wound Segmenter β
β U-Net++ / EfficientNet-B4β
β + scSE Attention β
β β
β β Pixel-wise wound mask β
β β Wound area (mmΒ²) β
β β Boundary metrics β
βββββββββββββββββββββββββββββ
Why cascaded? The data composition ablation proved that the segmenter performs best when trained exclusively on DFU images (85.89% Dice). Adding non-DFU wounds actually hurt performance (68.71% Dice) because the model gets confused learning two different wound morphologies simultaneously. The classifier handles triage; the segmenter focuses on what it does best.
Results
5-Fold Cross-Validated Segmentation (DFU)
The primary result, validated across 5 independent train/val splits for statistical rigor:
| Fold | Dice | IoU |
|---|---|---|
| 0 | 84.69% | 78.03% |
| 1 | 86.10% | 79.87% |
| 2 | 85.98% | 79.00% |
| 3 | 84.74% | 78.07% |
| 4 | 85.66% | 78.54% |
| Mean Β± Std | 85.43 Β± 0.61% | 78.70 Β± 0.68% |
The standard deviation of Β±0.61% confirms the model performs consistently regardless of data partitioning.
Test Set Evaluation (DFU, n=285)
| Metric | Value | Clinical Interpretation |
|---|---|---|
| Dice Score | 85.89% | Strong pixel-level wound overlap |
| IoU (Jaccard) | 79.35% | Solid intersection accuracy |
| HD95 | 17.3 px | 95th percentile boundary distance |
| NSD@2mm | 85.86% | 86% of predicted boundary within 2mm of ground truth |
| NSD@5mm | 94.74% | 95% within 5mm β clinically excellent |
| Wound Area Error | 1.1% | Predicted 1,342 mmΒ² vs ground truth 1,358 mmΒ² |
Data Composition Ablation
The single most important experiment β proving that data composition matters more than architecture:
| Training Data | Best Dice | Val Loss | Overfitting Ratio |
|---|---|---|---|
| DFU-only (1,881 images) | 87.44% | 0.1078 | 1.3x (minimal) |
| DFU-only (1,010 images) | 85.27% | 0.1057 | 1.0x (none) |
| DFU + non-DFU | 68.71% | 0.4187 | 1.4x |
| All classes (DFU + healthy + non-DFU) | 84.14%* | 0.6723 | 2.9x (heavy) |
*Inflated by healthy images scoring perfectly on empty masks.
Key finding: Adding 871 more DFU images (AZH wound care center data) improved Dice by +2.17% with no other changes. Data quality and quantity matter more than architecture complexity.
Architecture Comparison
| Model | Best Dice | Parameters | Notes |
|---|---|---|---|
| U-Net++ / EfficientNet-B4 + scSE | 87.44% | ~25M | Best performance |
| FUSegNet / EfficientNet-B7 + P-scSE | 69.60% | ~66M | Too many parameters for dataset size |
Test-Time Augmentation (TTA)
| Metric | Without TTA | With TTA (16-aug) | Improvement |
|---|---|---|---|
| Dice | 57.38%* | 61.26%* | +3.88% |
| IoU | 52.28%* | 56.29%* | +4.01% |
| HD95 | 87.47 | 84.88 | -2.59 (better) |
*Overall numbers including non-DFU images; DFU-specific TTA improvement follows the same trend.
Fairness Analysis (ITA-Stratified)
| ITA Group | Count | Dice | IoU | HD95 |
|---|---|---|---|---|
| Brown | 285 | 85.89% | 79.35% | 17.3 |
| Fairness gap | β | 0.00% | β | β |
Limitation: The dataset is predominantly composed of a single ITA skin tone group (Brown). While no fairness gap exists within the represented population, the model has not been validated across the full Fitzpatrick IβVI spectrum. ITA computation on wound images is confounded by wound bed color; a clinical deployment would require ITA measurement from non-wound skin regions specifically.
Dataset
Composition (8,105 total samples)
| Category | Images | Purpose |
|---|---|---|
| DFU | 2,119 | Wound segmentation training (FUSeg + AZH) |
| Healthy Feet | 3,300 | True negatives for classifier |
| Non-DFU Conditions | 2,686 | Hard negatives (general wounds, not DFU) |
Sources
| Dataset | Images | Type |
|---|---|---|
| FUSeg 2021 (UWM BigData Lab) | 1,010 | DFU with segmentation masks |
| AZH Wound Care Center | 1,109 | Clinical wound patches with masks |
| Kaggle DFU Patches | 543 | Healthy foot skin patches |
| Mendeley Wound Dataset (Normal) | 2,757 | Healthy foot images |
| Mendeley Wound Dataset (Wounds) | 2,686 | Non-DFU wound images with masks |
Data Pipeline
All images pass through a production cleaning pipeline:
- Integrity check β verify every image opens and is not corrupt
- Mask validation β binary format check, dimension alignment, coverage statistics
- Deduplication β perceptual hash (dHash) to remove cross-dataset duplicates
- Preprocessing β resize to 512Γ512 (aspect-preserving pad), CLAHE contrast enhancement, mask binarization
- Stratified splits β 70/15/15 train/val/test, stratified by class and ITA skin tone group, zero data leakage verified
Tech Stack
| Component | Library | Version |
|---|---|---|
| Deep Learning | PyTorch | 2.10.0 |
| Medical Imaging | MONAI | 1.5.2 |
| Segmentation | Segmentation Models PyTorch | 0.5.0 |
| Augmentation | Albumentations | 1.4.24 |
| Data Quality | CleanVision, Cleanlab | Latest |
| API | FastAPI | 0.133.0 |
| Inference | ONNX Runtime | 1.24.2 |
| Linting | Ruff | 0.15.2 |
| Compute | Northeastern Explorer HPC (H200/A100 GPUs) | β |
Quick Start
Installation
git clone https://github.com/Ruthvik-Bandari/DiaFoot.AI.git
cd DiaFoot.AI
pip install -r requirements.txt
Inference (Single Image)
python scripts/predict.py --image path/to/foot.jpg --device cuda
Training
# Train classifier (all 3 classes)
python scripts/train.py --task classify --epochs 50 --device cuda
# Train segmenter (DFU-only β proven best by ablation)
python scripts/run_ablation.py --variant dfu_only --epochs 100 --device cuda
5-Fold Cross-Validation
# Submit as SLURM array job (parallel)
sbatch slurm/run_cv.sh
# Or run individual folds
python scripts/run_cross_val.py --fold 0 --device cuda --epochs 100
python scripts/run_cross_val.py --fold 1 --device cuda --epochs 100
# ... folds 2, 3, 4
Evaluation
python scripts/evaluate.py \
--task segment \
--checkpoint checkpoints/ablation_dfu_only/best_epoch090_0.1078.pt \
--device cuda
ONNX Export
python scripts/export_onnx.py \
--checkpoint checkpoints/ablation_dfu_only/best_epoch090_0.1078.pt \
--output models/diafoot_segmenter.onnx \
--validate --benchmark
FastAPI Server
uvicorn src.deploy.app:app --host 0.0.0.0 --port 8000
# POST /predict with image file
Project Structure
DiaFoot.AI/
βββ configs/ # YAML configs for training, models, data
βββ data/
β βββ raw/ # Original datasets (DVC tracked)
β βββ processed/ # Cleaned, preprocessed 512Γ512 images
β βββ splits/ # Train/val/test CSVs
β βββ metadata/ # Quality reports, ITA scores
βββ src/
β βββ data/ # Dataset classes, augmentation, cleaning
β βββ models/ # U-Net++, FUSegNet, classifier, MedSAM2
β βββ training/ # Trainer, losses, schedulers, EMA
β βββ evaluation/ # Metrics, fairness, calibration, robustness
β βββ inference/ # Pipeline, TTA, postprocessing
β βββ deploy/ # FastAPI app
βββ scripts/ # Entry points for train, eval, export
βββ slurm/ # HPC job scripts
βββ results/ # Metrics, figures, reports
βββ checkpoints/ # Trained model weights
βββ tests/ # Unit tests
Peer Feedback Integration
Every piece of peer feedback from the AAI6620 course review was mapped to a specific implementation:
| Feedback | From | Implementation |
|---|---|---|
| How did augmentation handle skin tone diversity? | Sudeep K.S. | ITA-stratified fairness audit |
| Add attention mechanisms to reduce false positives | Shivam Dubey | scSE attention in U-Net++ decoder |
| Report performance relative to inter-annotator agreement | Yucheng Yan | Ceiling analysis framework |
| Tie uncertainty to clinical output | Yash Jain | TTA-based uncertainty maps |
| Prioritize ablation studies over deployment | Yucheng Yan | Data composition ablation as core experiment |
| Addressing algorithmic bias is a critical ethical hurdle | Ching-Yi Mao | ITA fairness audit with honest limitation disclosure |
Honest Limitations
Classifier accuracy (100%) is a dataset artifact. The three data categories come from visually distinct sources (different cameras, backgrounds). The classifier learned "which dataset" rather than "which condition." A production system requires same-source data across all classes.
Wagner staging was not trained. The architecture supports it, but clinical grade labels were unavailable. This is acknowledged as future work requiring clinical partnerships.
Limited skin tone diversity. The dataset is predominantly a single ITA group. Fairness conclusions cannot be generalized to the full Fitzpatrick IβVI spectrum. A clinical system would require validation across diverse skin tones.
Only 2 of 5 architectures were fully trained. FUSegNet underperformed; MedSAM2 LoRA and nnU-Net v2 were implemented but not trained due to time constraints.
Not validated on standardized benchmarks. Results are on our own data splits. Comparison against the DFUC 2022 challenge leaderboard would require access to their test set.
Regulatory & Ethical Notice
This project is developed strictly for academic and educational purposes. It is part of the AAI6620 Computer Vision coursework at Northeastern University.
This software:
- Is NOT a medical device as defined by the FDA, EU MDR, or any regulatory body
- Has NOT undergone clinical validation, regulatory review, or approval of any kind
- Is NOT intended to diagnose, treat, cure, or prevent any disease or medical condition
- Should NOT be used as a substitute for professional medical advice, diagnosis, or treatment
- Has NOT been validated in a prospective clinical setting
- Makes NO claims of clinical accuracy, safety, or efficacy
If you are experiencing a medical emergency or have concerns about a diabetic foot ulcer, contact your healthcare provider immediately.
Any use of this software for clinical decision-making is strictly prohibited and done entirely at the user's own risk. The authors, Northeastern University, and all affiliated parties disclaim all liability for any harm resulting from the use or misuse of this software.
For information on FDA-cleared wound measurement devices, visit FDA Medical Device Databases.
v1 β v2 Changelog
See CHANGELOG.md for the complete list of changes.
Citation
If you use this work for academic purposes, please cite:
@misc{bandari2026diafoot,
title={DiaFoot.AI: A Multi-Task Pipeline for Diabetic Foot Ulcer Detection and Segmentation},
author={Bandari, Ruthvik},
year={2026},
institution={Northeastern University},
course={AAI6620 Computer Vision},
note={Academic project β not for clinical use}
}
License
MIT License. See LICENSE for details.
This license grants permission for academic and research use. It does not grant permission for clinical or diagnostic use.
Built with care for educational impact. Data composition matters more than architecture.
Evaluation results
- Dice (5-fold CV)self-reported85.430
- IoU (5-fold CV)self-reported78.700
- NSD@5mmself-reported94.740