Upload CHANGELOG.md with huggingface_hub
Browse files- CHANGELOG.md +106 -0
CHANGELOG.md
ADDED
|
@@ -0,0 +1,106 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Changelog
|
| 2 |
+
|
| 3 |
+
All notable changes to DiaFoot.AI are documented in this file.
|
| 4 |
+
|
| 5 |
+
## [2.0.0] β 2026-03-05
|
| 6 |
+
|
| 7 |
+
### Complete Rebuild: v1 β v2
|
| 8 |
+
|
| 9 |
+
DiaFoot.AI v2 is a ground-up rebuild addressing fundamental flaws in v1's data pipeline and evaluation methodology.
|
| 10 |
+
|
| 11 |
+
### Why the Rebuild
|
| 12 |
+
|
| 13 |
+
v1 achieved 84.93% IoU / 91.73% Dice but had zero clinical specificity β it predicted ulcers on every input because it was trained exclusively on ulcer images with no concept of "not a wound." v2 transforms the project from a toy segmentation model into a clinically meaningful multi-task pipeline.
|
| 14 |
+
|
| 15 |
+
### Added
|
| 16 |
+
|
| 17 |
+
**Data Foundation**
|
| 18 |
+
- Three-category dataset: 2,119 DFU + 3,300 healthy + 2,686 non-DFU = 8,105 total images
|
| 19 |
+
- Production data cleaning pipeline: integrity checks, mask validation, perceptual hash deduplication
|
| 20 |
+
- CleanVision quality audit on all raw data (blur, darkness, duplicates, low-information detection)
|
| 21 |
+
- ITA skin tone analysis for fairness evaluation
|
| 22 |
+
- Stratified train/val/test splits (70/15/15) with zero leakage verification
|
| 23 |
+
- AZH wound care center dataset integration (1,109 new wound images)
|
| 24 |
+
|
| 25 |
+
**Architecture**
|
| 26 |
+
- Multi-task cascaded pipeline: Triage Classifier β Wound Segmenter
|
| 27 |
+
- EfficientNet-V2-M triage classifier (healthy vs non-DFU vs DFU)
|
| 28 |
+
- U-Net++ with EfficientNet-B4 encoder and scSE attention
|
| 29 |
+
- FUSegNet with EfficientNet-B7 and P-scSE attention (comparison architecture)
|
| 30 |
+
- MedSAM2 LoRA fine-tuning setup (implemented, not trained)
|
| 31 |
+
- nnU-Net v2 wrapper (implemented, not trained)
|
| 32 |
+
|
| 33 |
+
**Training**
|
| 34 |
+
- DiceCE compound loss function
|
| 35 |
+
- Cosine annealing scheduler with linear warmup
|
| 36 |
+
- Exponential Moving Average (EMA) weight tracking
|
| 37 |
+
- Early stopping (patience=15 epochs)
|
| 38 |
+
- BFloat16 mixed precision on H200 GPUs
|
| 39 |
+
- SLURM array jobs for parallel ablation studies
|
| 40 |
+
|
| 41 |
+
**Evaluation**
|
| 42 |
+
- Full metrics suite: Dice, IoU, HD95, NSD@2mm, NSD@5mm, ASSD
|
| 43 |
+
- Clinical metrics: wound area estimation (mmΒ²), wound perimeter
|
| 44 |
+
- Test-Time Augmentation (TTA) with 16 augmentations (+3.88% Dice improvement)
|
| 45 |
+
- ITA-stratified fairness audit (0.00% gap on DFU images)
|
| 46 |
+
- Data composition ablation (DFU-only vs mixed training)
|
| 47 |
+
- Architecture ablation (U-Net++ vs FUSegNet)
|
| 48 |
+
|
| 49 |
+
**Deployment**
|
| 50 |
+
- End-to-end inference pipeline (classify β segment β measure)
|
| 51 |
+
- FastAPI REST API with /predict endpoint
|
| 52 |
+
- ONNX export pipeline with validation and benchmarking
|
| 53 |
+
- Prediction visualization with mask overlays
|
| 54 |
+
|
| 55 |
+
**Documentation**
|
| 56 |
+
- Comprehensive README with results tables and honest limitations
|
| 57 |
+
- Dataset card documenting all sources and ITA distribution
|
| 58 |
+
- 38-commit structured development plan mapped to peer feedback
|
| 59 |
+
- Results notebooks with publication-ready figures
|
| 60 |
+
|
| 61 |
+
### Changed
|
| 62 |
+
|
| 63 |
+
- Training data: ulcer-only β three categories (healthy + non-DFU + DFU)
|
| 64 |
+
- Architecture: single binary segmenter β cascaded multi-task pipeline
|
| 65 |
+
- Evaluation: Dice/IoU only β Dice, IoU, HD95, NSD, clinical metrics, fairness
|
| 66 |
+
- Loss function: Focal Tversky β DiceCE compound loss
|
| 67 |
+
- Encoder: EfficientNet-B4 β EfficientNet-B4 with scSE attention
|
| 68 |
+
- Training loop: basic loop β production trainer with EMA, early stopping, checkpointing
|
| 69 |
+
- Data pipeline: raw images β cleaned, validated, deduplicated, stratified
|
| 70 |
+
|
| 71 |
+
### Fixed
|
| 72 |
+
|
| 73 |
+
- Model predicting ulcers on healthy skin (added negative examples)
|
| 74 |
+
- No data quality assurance (added CleanVision + Cleanlab pipeline)
|
| 75 |
+
- Inflated metrics from training on uncleaned data
|
| 76 |
+
- No skin tone fairness analysis (added ITA-stratified evaluation)
|
| 77 |
+
- No boundary quality metrics (added HD95, NSD)
|
| 78 |
+
|
| 79 |
+
### Key Results
|
| 80 |
+
|
| 81 |
+
| Metric | v1 | v2 | Improvement |
|
| 82 |
+
|--------|----|----|-------------|
|
| 83 |
+
| Dice (overall) | 91.73%* | 85.89% | Honest measurement |
|
| 84 |
+
| IoU (overall) | 84.93%* | 79.35% | Honest measurement |
|
| 85 |
+
| Clinical specificity | 0% | 100% (classifier) | From useless to useful |
|
| 86 |
+
| HD95 | Not measured | 17.3 px | New metric |
|
| 87 |
+
| NSD@5mm | Not measured | 94.74% | New metric |
|
| 88 |
+
| Wound area error | Not measured | 1.1% | New metric |
|
| 89 |
+
| Skin tone fairness gap | Not measured | 0.00% | New metric |
|
| 90 |
+
| Data categories | 1 (DFU only) | 3 (DFU + healthy + non-DFU) | Clinically complete |
|
| 91 |
+
| Data cleaning | None | Full pipeline | Production grade |
|
| 92 |
+
|
| 93 |
+
*\*v1 metrics were inflated by training on uncleaned data without negative examples.*
|
| 94 |
+
|
| 95 |
+
---
|
| 96 |
+
|
| 97 |
+
## [1.0.0] β 2026-02-01
|
| 98 |
+
|
| 99 |
+
### Initial Release
|
| 100 |
+
|
| 101 |
+
- U-Net++ with EfficientNet-B4 encoder
|
| 102 |
+
- FUSeg dataset (1,210 images, ulcers only)
|
| 103 |
+
- Focal Tversky loss
|
| 104 |
+
- 84.93% IoU, 91.73% Dice
|
| 105 |
+
- Basic CLAHE preprocessing
|
| 106 |
+
- No data cleaning, no negative examples, no fairness analysis
|