File size: 12,293 Bytes
5459402
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
---
license: cc-by-sa-4.0
tags:
- nnunet
- nnunetv2
- medical-imaging
- segmentation
- 3d-segmentation
- ct
- ldct
- low-dose-ct
- diagnostic-ct
- lung
- lung-cancer
- tumor-segmentation
- multi-institutional
- pretrained
library_name: nnunetv2
pipeline_tag: image-segmentation
datasets:
- MSD-Task06-Lung
- NLSTseg
language:
- en
---

# CLN-Segmenter β€” Dataset500 Unified Lung Lesion Pretrain (fold 0)

A 3D U-Net (nnU-Net v2 `3d_fullres`) trained on **Dataset500_LungLesions**, the unified Stage 1 pretraining corpus combining MSD Task06 (diagnostic CT) and NLSTseg (low-dose screening CT) β€” 667 expert-annotated cases. Fold 0 of 5-fold cross-validation. Released as part of the CLN-Segmenter project at the Rasool Lab, Moffitt Cancer Center.

This is the **v1 unified pretrain** intended as a starting point for downstream lung-lesion finetuning, especially when the target combines diagnostic and screening CT.

## Quick stats

| | |
|--|--|
| **Architecture** | nnU-Net v2 `3d_fullres` (PlainConvUNet, 6 stages, features `[32, 64, 128, 256, 320, 320]`) |
| **Training data** | Dataset500_LungLesions β€” 667 cases (533 train / 134 val for fold 0) |
| **Composition** | 63 MSD Task06 (diagnostic CT, 9%) + 604 NLSTseg (LDCT, 91%) |
| **Loss** | Dice + Cross-Entropy (nnU-Net default), `batch_dice=True` |
| **Schedule** | 1000 epochs, polynomial LR decay 0.01 β†’ 0, batch size 2, patch `[80, 192, 160]` |
| **Hardware** | 1Γ— NVIDIA H100 80GB, ~6h 41m wall-time |
| **Best EMA Pseudo Dice** (in-training) | **0.7658** (epoch ~960) |
| **Mean Validation Dice** (per-case, sliding-window) | **0.6172** |
| **Foreground IoU** | **0.5121** |
| **Generalization** | No measurable overfitting β€” train/val loss curves overlap throughout |

## ⚠️ Two metrics, both honest β€” read this section

The two Dice numbers reported above are computed differently and **disagree by ~0.15**. Both are correct; they answer different questions:

### `Best EMA Pseudo Dice = 0.7658` (in-training, voxel-pooled)
Computed by nnU-Net every epoch on patches sampled from validation cases. Pools True Positives, False Positives, False Negatives across all val patches into one Dice. **Voxel-weighted**: large lesions dominate. This is the metric nnU-Net uses to select `checkpoint_best.pth`.

### `Mean Validation Dice = 0.6172` (sliding-window, per-case averaged)
Computed *after* training by running full-volume sliding-window inference on each of the 134 fold-0 validation cases, computing per-case Dice, then averaging. **Case-weighted**: each scan counts equally regardless of tumor size. **This is the metric most papers report.**

### Why the gap is large for *this* dataset

NLSTseg (91% of cases) has a wide range of lesion sizes (median 1.37 cmΒ³, but the per-lesion volume distribution spans 0.03 to 372 cmΒ³ in the source). MSD's tumors (9% of cases) are uniformly larger (median 5.22 cmΒ³).

- **Pseudo Dice** is dominated by the big-tumor voxel mass β†’ looks high (0.77).
- **Mean Validation Dice** treats a tiny 4 mm nodule with Dice 0.30 the same as a large tumor with Dice 0.85 β†’ drops the average toward the harder small-lesion cases (0.62).

For comparison: case_0001 (MSD) achieves per-case Dice **0.892** in this fold's validation. Several small-lesion NLSTseg cases score below 0.40. The 0.6172 average reflects that distribution faithfully.

### Which one should you cite?

- **For papers and external comparisons**: cite **0.6172 Mean Validation Dice** (per-case).
- **For comparisons against nnU-Net's training-time logs of other people's runs**: cite **0.7658 Pseudo Dice**.
- For full-pipeline performance: also report a 5-fold ensemble Mean Dice (~+3-5% above single-fold typically) once all 5 folds are trained.

## Files in this repo

| File | Role |
|------|------|
| `checkpoint_best.pth` | Model weights β€” saved at the EMA Pseudo Dice peak (~epoch 960) |
| `nnUNetPlans.json` | Architecture spec + preprocessing plans. **Required** for inference. |
| `dataset.json` | Channel names, label names, file ending (nnU-Net v2 schema). **Required** for inference. |
| `dataset_fingerprint.json` | HU intensity stats from training data |
| `splits_final.json` | Train/val case ID splits for fold 0 (reproducibility) |
| `progress.png` | Training curves: loss, Pseudo Dice, epoch duration, learning rate |
| `validation_summary.json` | Per-case validation Dice/IoU/TP/FP/FN for all 134 fold-0 validation cases |

## Training data and provenance

This model was trained **only on publicly available datasets**:

- **MSD Task06 Lung** (Antonelli et al. 2022, *Nature Communications*, CC-BY-SA 4.0) β€” 63 expert tumor masks on diagnostic CT
- **NLSTseg** (Chen et al. 2025, *Scientific Data*, CC-BY 4.0) β€” 604 expert pixel-level masks on low-dose screening CT (1 patient excluded β€” `nlst_0393` / patient 205714 β€” due to a CT/mask shape mismatch in the source files)

The two source datasets were unified via [`build_unified_dataset.py`](https://github.com/lab-rasool/CLN-Segmenter/blob/main/data_prep/build_unified_dataset.py): images copied verbatim, NLSTseg multi-label masks binarized via `(mask > 0).astype(uint8)`, sequential renumbering as `case_0001` … `case_0667` (MSD first, then NLSTseg). Full mapping in the dataset repo's `id_mapping.csv`.

**LUNA16 was intentionally excluded.** Its sphere-mask conversion from `(centroid, diameter)` annotations produced semantically incoherent foreground (HU spans lung air β†’ soft tissue β†’ bone) and the standalone Dataset501_LUNA16 run trained 1000 epochs at Pseudo Dice 0. Re-evaluating with LIDC-IDRI consensus masks is a candidate for v2.

**No patient-identifiable or institutional data was used.** This checkpoint contains no information derived from any non-public source.

## Foreground intensity profile (training-data fingerprint)

The unified dataset's CT HU statistics inside foreground (lesion) voxels:

| Stat | Value |
|--|--|
| mean | -197 HU |
| median | -134 HU |
| std | 259 |
| 0.5%-ile | -926 |
| 99.5%-ile | 252 |

The distribution is dominated by NLSTseg (91% of cases) with a slight pull from MSD's heavier tails. Mean and median sit cleanly in soft-tissue-adjacent territory; the 99.5%-ile stays away from bone/implant ranges. This is a coherent foreground class for default Dice+CE β€” and the training curves confirm it.

## Intended use

- **Pretrained starting point** for finetuning on related lung-lesion segmentation tasks (especially mixed-modality or institutional-shift settings)
- **Reference for unified multi-source pretraining** with default nnU-Net v2 settings
- **Input to ensembling** with other folds (when 5-fold runs are available)

## How NOT to use it

- ❌ Not validated for clinical diagnosis or treatment decisions
- ❌ Single fold, not an ensemble β€” paper-grade results require all 5 folds
- ❌ Distribution-shift expectations: predominantly LDCT (91%); transfer to a pure diagnostic-CT target may be helped further by finetuning, or by using `Lab-Rasool/CLN-Segmenter-MSD-fold0` as the starting point instead

## How to use

### 1. Download the checkpoint and metadata

```python
from huggingface_hub import snapshot_download
local_dir = snapshot_download(repo_id="Lab-Rasool/CLN-Segmenter-Dataset500-fold0")
print("Files at:", local_dir)
```

### 2. Set up an nnU-Net inference directory

```
nnUNet_results/
└── Dataset500_LungLesions/
    └── nnUNetTrainer__nnUNetPlans__3d_fullres/
        β”œβ”€β”€ dataset.json
        β”œβ”€β”€ plans.json                    (rename from nnUNetPlans.json)
        β”œβ”€β”€ dataset_fingerprint.json
        └── fold_0/
            β”œβ”€β”€ checkpoint_best.pth
            └── splits_final.json
```

```bash
DST=/path/to/nnUNet_results/Dataset500_LungLesions/nnUNetTrainer__nnUNetPlans__3d_fullres
mkdir -p $DST/fold_0
cp $local_dir/dataset.json              $DST/dataset.json
cp $local_dir/nnUNetPlans.json          $DST/plans.json
cp $local_dir/dataset_fingerprint.json  $DST/dataset_fingerprint.json
cp $local_dir/checkpoint_best.pth       $DST/fold_0/checkpoint_best.pth
cp $local_dir/splits_final.json         $DST/fold_0/splits_final.json
```

### 3. Run inference with nnU-Net

```bash
export nnUNet_results=/path/to/nnUNet_results
nnUNetv2_predict \
    -i /path/to/your/input_images \
    -o /path/to/output_predictions \
    -d 500 \
    -c 3d_fullres \
    -tr nnUNetTrainer \
    -p nnUNetPlans \
    -f 0 \
    -chk checkpoint_best.pth
```

Input images should be CT volumes named with the nnU-Net channel suffix: `<case_id>_0000.nii.gz`.

## Training procedure

- **Framework**: nnU-Net v2.7.0 (default trainer)
- **Preprocessing**: CT-specific normalization (HU clipping at the 0.5/99.5 percentiles of foreground voxels, then per-case z-score), resampling to target spacing `[1.245, 0.664, 0.664]` mm
- **Augmentation**: nnU-Net's default 3D augmentation pipeline (rotation, scaling, gamma, mirroring, gaussian noise/blur, low-resolution simulation)
- **Optimization**: SGD + Nesterov momentum (Ξ²=0.99), polynomial LR decay (initial LR 0.01 β†’ 0)
- **Iterations**: fixed 250 per epoch (nnU-Net default; independent of dataset size)
- **Best-checkpoint mechanism**: nnU-Net automatically tracks EMA of validation Pseudo Dice and saves `checkpoint_best.pth` at the peak

## Domain composition note

The training corpus is **9% diagnostic CT (MSD) and 91% LDCT (NLSTseg)**. nnU-Net does not explicitly rebalance per-source sampling β€” the model sees patches in proportion to case count. With ~500K total patches over 1000 epochs Γ— 250 iterations Γ— batch 2, that translates to ~45,000 MSD patches and ~455,000 NLSTseg patches.

Empirically the model handles both modalities (`case_0001` MSD scores Dice 0.89 in fold-0 validation), but the underlying representation skews LDCT. Stage 1 v2 will rebalance by adding more diagnostic-CT data (LIDC-IDRI consensus, NSCLC-Radiomics) rather than re-weighting existing samples.

## Limitations

- **Single fold of 5-fold CV** β€” not an ensemble. Paper-grade results require all 5 folds either averaged or ensembled at inference.
- **Domain imbalance** β€” 91% LDCT may underperform without finetuning on a pure diagnostic-CT target (consider `Lab-Rasool/CLN-Segmenter-MSD-fold0` for that case).
- **Small-lesion performance** β€” per-case Dice for tiny nodules (<5mm) is noticeably worse than for larger tumors; the 0.6172 mean reflects the full distribution including these hard cases.
- **One source case excluded** (`nlst_0393` / patient 205714) due to source-data shape mismatch.
- **No clinical validation** β€” this is a research artifact, not a medical device.

## License

**CC-BY-SA 4.0**, inherited from the share-alike clause of the MSD Task06 source dataset license.

## Citation

If you use this model, please cite all three works:

```bibtex
@article{isensee2021nnunet,
  title   = {nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation},
  author  = {Isensee, Fabian and Jaeger, Paul F and Kohl, Simon A A and Petersen, Jens and Maier-Hein, Klaus H},
  journal = {Nature Methods},
  volume  = {18},
  number  = {2},
  pages   = {203--211},
  year    = {2021}
}

@article{antonelli2022medical,
  title   = {The Medical Segmentation Decathlon},
  author  = {Antonelli, Michela and Reinke, Annika and Bakas, Spyridon and others},
  journal = {Nature Communications},
  volume  = {13},
  number  = {1},
  pages   = {4128},
  year    = {2022}
}

@article{chen2025nlstseg,
  title   = {NLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT Images},
  author  = {Chen, et al.},
  journal = {Scientific Data},
  year    = {2025},
  doi     = {10.1038/s41597-025-05742-x}
}
```

## Project context

Part of **CLN-Segmenter** at the Rasool Lab, Moffitt Cancer Center: a two-stage approach for lung lesion segmentation that pretrains on public datasets (this is the v1 unified pretrain) and finetunes on internal data with domain-specific loss formulations.

- **Code**: https://github.com/lab-rasool/CLN-Segmenter
- **Lab**: https://huggingface.co/Lab-Rasool

Other models in this series:
- `Lab-Rasool/CLN-Segmenter-MSD-fold0` β€” MSD-only POC (diagnostic CT, 63 cases, Pseudo Dice 0.82)
- `Lab-Rasool/CLN-Segmenter-NLSTseg-fold0` β€” NLSTseg-only POC (LDCT, 604 cases, Pseudo Dice 0.77)