RuthvikBandari commited on
Commit
08ccd6b
Β·
verified Β·
1 Parent(s): a63fd0a

Upload CHANGELOG.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. CHANGELOG.md +106 -0
CHANGELOG.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Changelog
2
+
3
+ All notable changes to DiaFoot.AI are documented in this file.
4
+
5
+ ## [2.0.0] β€” 2026-03-05
6
+
7
+ ### Complete Rebuild: v1 β†’ v2
8
+
9
+ DiaFoot.AI v2 is a ground-up rebuild addressing fundamental flaws in v1's data pipeline and evaluation methodology.
10
+
11
+ ### Why the Rebuild
12
+
13
+ v1 achieved 84.93% IoU / 91.73% Dice but had zero clinical specificity β€” it predicted ulcers on every input because it was trained exclusively on ulcer images with no concept of "not a wound." v2 transforms the project from a toy segmentation model into a clinically meaningful multi-task pipeline.
14
+
15
+ ### Added
16
+
17
+ **Data Foundation**
18
+ - Three-category dataset: 2,119 DFU + 3,300 healthy + 2,686 non-DFU = 8,105 total images
19
+ - Production data cleaning pipeline: integrity checks, mask validation, perceptual hash deduplication
20
+ - CleanVision quality audit on all raw data (blur, darkness, duplicates, low-information detection)
21
+ - ITA skin tone analysis for fairness evaluation
22
+ - Stratified train/val/test splits (70/15/15) with zero leakage verification
23
+ - AZH wound care center dataset integration (1,109 new wound images)
24
+
25
+ **Architecture**
26
+ - Multi-task cascaded pipeline: Triage Classifier β†’ Wound Segmenter
27
+ - EfficientNet-V2-M triage classifier (healthy vs non-DFU vs DFU)
28
+ - U-Net++ with EfficientNet-B4 encoder and scSE attention
29
+ - FUSegNet with EfficientNet-B7 and P-scSE attention (comparison architecture)
30
+ - MedSAM2 LoRA fine-tuning setup (implemented, not trained)
31
+ - nnU-Net v2 wrapper (implemented, not trained)
32
+
33
+ **Training**
34
+ - DiceCE compound loss function
35
+ - Cosine annealing scheduler with linear warmup
36
+ - Exponential Moving Average (EMA) weight tracking
37
+ - Early stopping (patience=15 epochs)
38
+ - BFloat16 mixed precision on H200 GPUs
39
+ - SLURM array jobs for parallel ablation studies
40
+
41
+ **Evaluation**
42
+ - Full metrics suite: Dice, IoU, HD95, NSD@2mm, NSD@5mm, ASSD
43
+ - Clinical metrics: wound area estimation (mmΒ²), wound perimeter
44
+ - Test-Time Augmentation (TTA) with 16 augmentations (+3.88% Dice improvement)
45
+ - ITA-stratified fairness audit (0.00% gap on DFU images)
46
+ - Data composition ablation (DFU-only vs mixed training)
47
+ - Architecture ablation (U-Net++ vs FUSegNet)
48
+
49
+ **Deployment**
50
+ - End-to-end inference pipeline (classify β†’ segment β†’ measure)
51
+ - FastAPI REST API with /predict endpoint
52
+ - ONNX export pipeline with validation and benchmarking
53
+ - Prediction visualization with mask overlays
54
+
55
+ **Documentation**
56
+ - Comprehensive README with results tables and honest limitations
57
+ - Dataset card documenting all sources and ITA distribution
58
+ - 38-commit structured development plan mapped to peer feedback
59
+ - Results notebooks with publication-ready figures
60
+
61
+ ### Changed
62
+
63
+ - Training data: ulcer-only β†’ three categories (healthy + non-DFU + DFU)
64
+ - Architecture: single binary segmenter β†’ cascaded multi-task pipeline
65
+ - Evaluation: Dice/IoU only β†’ Dice, IoU, HD95, NSD, clinical metrics, fairness
66
+ - Loss function: Focal Tversky β†’ DiceCE compound loss
67
+ - Encoder: EfficientNet-B4 β†’ EfficientNet-B4 with scSE attention
68
+ - Training loop: basic loop β†’ production trainer with EMA, early stopping, checkpointing
69
+ - Data pipeline: raw images β†’ cleaned, validated, deduplicated, stratified
70
+
71
+ ### Fixed
72
+
73
+ - Model predicting ulcers on healthy skin (added negative examples)
74
+ - No data quality assurance (added CleanVision + Cleanlab pipeline)
75
+ - Inflated metrics from training on uncleaned data
76
+ - No skin tone fairness analysis (added ITA-stratified evaluation)
77
+ - No boundary quality metrics (added HD95, NSD)
78
+
79
+ ### Key Results
80
+
81
+ | Metric | v1 | v2 | Improvement |
82
+ |--------|----|----|-------------|
83
+ | Dice (overall) | 91.73%* | 85.89% | Honest measurement |
84
+ | IoU (overall) | 84.93%* | 79.35% | Honest measurement |
85
+ | Clinical specificity | 0% | 100% (classifier) | From useless to useful |
86
+ | HD95 | Not measured | 17.3 px | New metric |
87
+ | NSD@5mm | Not measured | 94.74% | New metric |
88
+ | Wound area error | Not measured | 1.1% | New metric |
89
+ | Skin tone fairness gap | Not measured | 0.00% | New metric |
90
+ | Data categories | 1 (DFU only) | 3 (DFU + healthy + non-DFU) | Clinically complete |
91
+ | Data cleaning | None | Full pipeline | Production grade |
92
+
93
+ *\*v1 metrics were inflated by training on uncleaned data without negative examples.*
94
+
95
+ ---
96
+
97
+ ## [1.0.0] β€” 2026-02-01
98
+
99
+ ### Initial Release
100
+
101
+ - U-Net++ with EfficientNet-B4 encoder
102
+ - FUSeg dataset (1,210 images, ulcers only)
103
+ - Focal Tversky loss
104
+ - 84.93% IoU, 91.73% Dice
105
+ - Basic CLAHE preprocessing
106
+ - No data cleaning, no negative examples, no fairness analysis