File size: 12,293 Bytes
5459402 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 | ---
license: cc-by-sa-4.0
tags:
- nnunet
- nnunetv2
- medical-imaging
- segmentation
- 3d-segmentation
- ct
- ldct
- low-dose-ct
- diagnostic-ct
- lung
- lung-cancer
- tumor-segmentation
- multi-institutional
- pretrained
library_name: nnunetv2
pipeline_tag: image-segmentation
datasets:
- MSD-Task06-Lung
- NLSTseg
language:
- en
---
# CLN-Segmenter β Dataset500 Unified Lung Lesion Pretrain (fold 0)
A 3D U-Net (nnU-Net v2 `3d_fullres`) trained on **Dataset500_LungLesions**, the unified Stage 1 pretraining corpus combining MSD Task06 (diagnostic CT) and NLSTseg (low-dose screening CT) β 667 expert-annotated cases. Fold 0 of 5-fold cross-validation. Released as part of the CLN-Segmenter project at the Rasool Lab, Moffitt Cancer Center.
This is the **v1 unified pretrain** intended as a starting point for downstream lung-lesion finetuning, especially when the target combines diagnostic and screening CT.
## Quick stats
| | |
|--|--|
| **Architecture** | nnU-Net v2 `3d_fullres` (PlainConvUNet, 6 stages, features `[32, 64, 128, 256, 320, 320]`) |
| **Training data** | Dataset500_LungLesions β 667 cases (533 train / 134 val for fold 0) |
| **Composition** | 63 MSD Task06 (diagnostic CT, 9%) + 604 NLSTseg (LDCT, 91%) |
| **Loss** | Dice + Cross-Entropy (nnU-Net default), `batch_dice=True` |
| **Schedule** | 1000 epochs, polynomial LR decay 0.01 β 0, batch size 2, patch `[80, 192, 160]` |
| **Hardware** | 1Γ NVIDIA H100 80GB, ~6h 41m wall-time |
| **Best EMA Pseudo Dice** (in-training) | **0.7658** (epoch ~960) |
| **Mean Validation Dice** (per-case, sliding-window) | **0.6172** |
| **Foreground IoU** | **0.5121** |
| **Generalization** | No measurable overfitting β train/val loss curves overlap throughout |
## β οΈ Two metrics, both honest β read this section
The two Dice numbers reported above are computed differently and **disagree by ~0.15**. Both are correct; they answer different questions:
### `Best EMA Pseudo Dice = 0.7658` (in-training, voxel-pooled)
Computed by nnU-Net every epoch on patches sampled from validation cases. Pools True Positives, False Positives, False Negatives across all val patches into one Dice. **Voxel-weighted**: large lesions dominate. This is the metric nnU-Net uses to select `checkpoint_best.pth`.
### `Mean Validation Dice = 0.6172` (sliding-window, per-case averaged)
Computed *after* training by running full-volume sliding-window inference on each of the 134 fold-0 validation cases, computing per-case Dice, then averaging. **Case-weighted**: each scan counts equally regardless of tumor size. **This is the metric most papers report.**
### Why the gap is large for *this* dataset
NLSTseg (91% of cases) has a wide range of lesion sizes (median 1.37 cmΒ³, but the per-lesion volume distribution spans 0.03 to 372 cmΒ³ in the source). MSD's tumors (9% of cases) are uniformly larger (median 5.22 cmΒ³).
- **Pseudo Dice** is dominated by the big-tumor voxel mass β looks high (0.77).
- **Mean Validation Dice** treats a tiny 4 mm nodule with Dice 0.30 the same as a large tumor with Dice 0.85 β drops the average toward the harder small-lesion cases (0.62).
For comparison: case_0001 (MSD) achieves per-case Dice **0.892** in this fold's validation. Several small-lesion NLSTseg cases score below 0.40. The 0.6172 average reflects that distribution faithfully.
### Which one should you cite?
- **For papers and external comparisons**: cite **0.6172 Mean Validation Dice** (per-case).
- **For comparisons against nnU-Net's training-time logs of other people's runs**: cite **0.7658 Pseudo Dice**.
- For full-pipeline performance: also report a 5-fold ensemble Mean Dice (~+3-5% above single-fold typically) once all 5 folds are trained.
## Files in this repo
| File | Role |
|------|------|
| `checkpoint_best.pth` | Model weights β saved at the EMA Pseudo Dice peak (~epoch 960) |
| `nnUNetPlans.json` | Architecture spec + preprocessing plans. **Required** for inference. |
| `dataset.json` | Channel names, label names, file ending (nnU-Net v2 schema). **Required** for inference. |
| `dataset_fingerprint.json` | HU intensity stats from training data |
| `splits_final.json` | Train/val case ID splits for fold 0 (reproducibility) |
| `progress.png` | Training curves: loss, Pseudo Dice, epoch duration, learning rate |
| `validation_summary.json` | Per-case validation Dice/IoU/TP/FP/FN for all 134 fold-0 validation cases |
## Training data and provenance
This model was trained **only on publicly available datasets**:
- **MSD Task06 Lung** (Antonelli et al. 2022, *Nature Communications*, CC-BY-SA 4.0) β 63 expert tumor masks on diagnostic CT
- **NLSTseg** (Chen et al. 2025, *Scientific Data*, CC-BY 4.0) β 604 expert pixel-level masks on low-dose screening CT (1 patient excluded β `nlst_0393` / patient 205714 β due to a CT/mask shape mismatch in the source files)
The two source datasets were unified via [`build_unified_dataset.py`](https://github.com/lab-rasool/CLN-Segmenter/blob/main/data_prep/build_unified_dataset.py): images copied verbatim, NLSTseg multi-label masks binarized via `(mask > 0).astype(uint8)`, sequential renumbering as `case_0001` β¦ `case_0667` (MSD first, then NLSTseg). Full mapping in the dataset repo's `id_mapping.csv`.
**LUNA16 was intentionally excluded.** Its sphere-mask conversion from `(centroid, diameter)` annotations produced semantically incoherent foreground (HU spans lung air β soft tissue β bone) and the standalone Dataset501_LUNA16 run trained 1000 epochs at Pseudo Dice 0. Re-evaluating with LIDC-IDRI consensus masks is a candidate for v2.
**No patient-identifiable or institutional data was used.** This checkpoint contains no information derived from any non-public source.
## Foreground intensity profile (training-data fingerprint)
The unified dataset's CT HU statistics inside foreground (lesion) voxels:
| Stat | Value |
|--|--|
| mean | -197 HU |
| median | -134 HU |
| std | 259 |
| 0.5%-ile | -926 |
| 99.5%-ile | 252 |
The distribution is dominated by NLSTseg (91% of cases) with a slight pull from MSD's heavier tails. Mean and median sit cleanly in soft-tissue-adjacent territory; the 99.5%-ile stays away from bone/implant ranges. This is a coherent foreground class for default Dice+CE β and the training curves confirm it.
## Intended use
- **Pretrained starting point** for finetuning on related lung-lesion segmentation tasks (especially mixed-modality or institutional-shift settings)
- **Reference for unified multi-source pretraining** with default nnU-Net v2 settings
- **Input to ensembling** with other folds (when 5-fold runs are available)
## How NOT to use it
- β Not validated for clinical diagnosis or treatment decisions
- β Single fold, not an ensemble β paper-grade results require all 5 folds
- β Distribution-shift expectations: predominantly LDCT (91%); transfer to a pure diagnostic-CT target may be helped further by finetuning, or by using `Lab-Rasool/CLN-Segmenter-MSD-fold0` as the starting point instead
## How to use
### 1. Download the checkpoint and metadata
```python
from huggingface_hub import snapshot_download
local_dir = snapshot_download(repo_id="Lab-Rasool/CLN-Segmenter-Dataset500-fold0")
print("Files at:", local_dir)
```
### 2. Set up an nnU-Net inference directory
```
nnUNet_results/
βββ Dataset500_LungLesions/
βββ nnUNetTrainer__nnUNetPlans__3d_fullres/
βββ dataset.json
βββ plans.json (rename from nnUNetPlans.json)
βββ dataset_fingerprint.json
βββ fold_0/
βββ checkpoint_best.pth
βββ splits_final.json
```
```bash
DST=/path/to/nnUNet_results/Dataset500_LungLesions/nnUNetTrainer__nnUNetPlans__3d_fullres
mkdir -p $DST/fold_0
cp $local_dir/dataset.json $DST/dataset.json
cp $local_dir/nnUNetPlans.json $DST/plans.json
cp $local_dir/dataset_fingerprint.json $DST/dataset_fingerprint.json
cp $local_dir/checkpoint_best.pth $DST/fold_0/checkpoint_best.pth
cp $local_dir/splits_final.json $DST/fold_0/splits_final.json
```
### 3. Run inference with nnU-Net
```bash
export nnUNet_results=/path/to/nnUNet_results
nnUNetv2_predict \
-i /path/to/your/input_images \
-o /path/to/output_predictions \
-d 500 \
-c 3d_fullres \
-tr nnUNetTrainer \
-p nnUNetPlans \
-f 0 \
-chk checkpoint_best.pth
```
Input images should be CT volumes named with the nnU-Net channel suffix: `<case_id>_0000.nii.gz`.
## Training procedure
- **Framework**: nnU-Net v2.7.0 (default trainer)
- **Preprocessing**: CT-specific normalization (HU clipping at the 0.5/99.5 percentiles of foreground voxels, then per-case z-score), resampling to target spacing `[1.245, 0.664, 0.664]` mm
- **Augmentation**: nnU-Net's default 3D augmentation pipeline (rotation, scaling, gamma, mirroring, gaussian noise/blur, low-resolution simulation)
- **Optimization**: SGD + Nesterov momentum (Ξ²=0.99), polynomial LR decay (initial LR 0.01 β 0)
- **Iterations**: fixed 250 per epoch (nnU-Net default; independent of dataset size)
- **Best-checkpoint mechanism**: nnU-Net automatically tracks EMA of validation Pseudo Dice and saves `checkpoint_best.pth` at the peak
## Domain composition note
The training corpus is **9% diagnostic CT (MSD) and 91% LDCT (NLSTseg)**. nnU-Net does not explicitly rebalance per-source sampling β the model sees patches in proportion to case count. With ~500K total patches over 1000 epochs Γ 250 iterations Γ batch 2, that translates to ~45,000 MSD patches and ~455,000 NLSTseg patches.
Empirically the model handles both modalities (`case_0001` MSD scores Dice 0.89 in fold-0 validation), but the underlying representation skews LDCT. Stage 1 v2 will rebalance by adding more diagnostic-CT data (LIDC-IDRI consensus, NSCLC-Radiomics) rather than re-weighting existing samples.
## Limitations
- **Single fold of 5-fold CV** β not an ensemble. Paper-grade results require all 5 folds either averaged or ensembled at inference.
- **Domain imbalance** β 91% LDCT may underperform without finetuning on a pure diagnostic-CT target (consider `Lab-Rasool/CLN-Segmenter-MSD-fold0` for that case).
- **Small-lesion performance** β per-case Dice for tiny nodules (<5mm) is noticeably worse than for larger tumors; the 0.6172 mean reflects the full distribution including these hard cases.
- **One source case excluded** (`nlst_0393` / patient 205714) due to source-data shape mismatch.
- **No clinical validation** β this is a research artifact, not a medical device.
## License
**CC-BY-SA 4.0**, inherited from the share-alike clause of the MSD Task06 source dataset license.
## Citation
If you use this model, please cite all three works:
```bibtex
@article{isensee2021nnunet,
title = {nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation},
author = {Isensee, Fabian and Jaeger, Paul F and Kohl, Simon A A and Petersen, Jens and Maier-Hein, Klaus H},
journal = {Nature Methods},
volume = {18},
number = {2},
pages = {203--211},
year = {2021}
}
@article{antonelli2022medical,
title = {The Medical Segmentation Decathlon},
author = {Antonelli, Michela and Reinke, Annika and Bakas, Spyridon and others},
journal = {Nature Communications},
volume = {13},
number = {1},
pages = {4128},
year = {2022}
}
@article{chen2025nlstseg,
title = {NLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT Images},
author = {Chen, et al.},
journal = {Scientific Data},
year = {2025},
doi = {10.1038/s41597-025-05742-x}
}
```
## Project context
Part of **CLN-Segmenter** at the Rasool Lab, Moffitt Cancer Center: a two-stage approach for lung lesion segmentation that pretrains on public datasets (this is the v1 unified pretrain) and finetunes on internal data with domain-specific loss formulations.
- **Code**: https://github.com/lab-rasool/CLN-Segmenter
- **Lab**: https://huggingface.co/Lab-Rasool
Other models in this series:
- `Lab-Rasool/CLN-Segmenter-MSD-fold0` β MSD-only POC (diagnostic CT, 63 cases, Pseudo Dice 0.82)
- `Lab-Rasool/CLN-Segmenter-NLSTseg-fold0` β NLSTseg-only POC (LDCT, 604 cases, Pseudo Dice 0.77)
|