File size: 12,084 Bytes
3e21f4a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3f804c8
 
3e21f4a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3f804c8
 
 
 
 
 
 
 
 
 
3e21f4a
 
 
 
3f804c8
3e21f4a
3f804c8
3e21f4a
3f804c8
3e21f4a
3f804c8
3e21f4a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
---
license: cc-by-4.0
tags:
- nnunet
- nnunetv2
- medical-imaging
- segmentation
- 3d-segmentation
- ct
- ldct
- low-dose-ct
- lung
- lung-cancer
- tumor-segmentation
- multi-institutional
library_name: nnunetv2
pipeline_tag: image-segmentation
datasets:
- NLSTseg
language:
- en
---

# CLN-Segmenter β€” NLSTseg Lung Lesion Segmentation (fold 0)

A 3D U-Net (nnU-Net v2 `3d_fullres`) trained on the **NLSTseg** dataset β€” pixel-level lung lesion annotations on low-dose screening CT (LDCT) from the National Lung Screening Trial. Fold 0 of 5-fold cross-validation. Released as part of the CLN-Segmenter project at the Rasool Lab, Moffitt Cancer Center.

This is a single-fold pretrain checkpoint, intended as a starting point for downstream lung-lesion segmentation work β€” not a clinical-grade tool.

## Quick stats

| | |
|--|--|
| **Architecture** | nnU-Net v2 `3d_fullres` (PlainConvUNet, 6 stages, features `[32, 64, 128, 256, 320, 320]`) |
| **Training data** | NLSTseg β€” 604 cases (1 excluded; 483 train / 121 val for fold 0) |
| **Modality** | Low-dose screening CT (LDCT), multi-institutional |
| **Loss** | Dice + Cross-Entropy (nnU-Net default), `batch_dice=True` |
| **Schedule** | 1000 epochs, polynomial LR decay 0.01 β†’ 0, batch size 2, patch `[80, 192, 160]` |
| **Hardware** | 1Γ— NVIDIA H100 80GB, ~7h wall-time |
| **Mean Validation Dice** (per-case, sliding-window) | **0.6123** |
| **Best EMA Pseudo Dice** (in-training proxy) | 0.7663 (epoch ~870) |
| **Generalization** | No measurable overfitting β€” train/val loss curves overlap throughout |

## Files in this repo

| File | Role |
|------|------|
| `checkpoint_best.pth` | Model weights β€” saved at the EMA Pseudo Dice peak (~epoch 870) |
| `nnUNetPlans.json` | Architecture spec + preprocessing plans. **Required** for inference. |
| `dataset.json` | Channel names, label names, file ending (nnU-Net v2 schema). **Required** for inference. |
| `dataset_fingerprint.json` | HU intensity stats from training data |
| `splits_final.json` | Train/val case ID splits for fold 0 (reproducibility) |
| `progress.png` | Training curves: loss, Pseudo Dice, epoch duration, learning rate |

## Training data and provenance

This model was trained **only on the publicly available NLSTseg dataset** (Chen et al. 2025, *Scientific Data*, CC-BY 4.0): pixel-level lung lesion annotations on top of NLST low-dose screening CT imagery. It contains 715 expert-annotated lesions across 605 patients (1 patient excluded β€” `nlst_0393` / patient 205714 β€” due to a CT/mask shape mismatch in the source files; see project changelog).

NLSTseg has key characteristics that make it complementary to diagnostic-CT datasets:

- **Multi-institutional**: 33 contributing institutions, 4 scanner brands (GE, Siemens, Philips, Toshiba)
- **Screening-cohort lesions**: smaller than typical diagnostic-CT tumors (median lesion volume **1.37 cmΒ³**) β€” most caught at Stage IA
- **Multi-label source**: per-lesion integer labels (1–7) in the original masks; binarized to `{0, 1}` for this single-class training. The tumor-vs-nodule distinction (`labels_type` 1 vs 2 in the original `Label.xlsx`) is recoverable from the source if a future multi-class run is desired.
- **LDCT noise**: lower radiation dose than diagnostic CT; noisier images, often thicker slices

**No patient-identifiable or institutional data was used.** This checkpoint contains no information derived from any non-public source.

## Intended use

- **Pretrained starting point** for finetuning on related lung-lesion segmentation tasks, especially LDCT or screening-cohort data
- **Reference baseline** for nnU-Net default performance on NLSTseg's small-lesion, multi-institutional regime
- **Input to ensembling** with other folds (when 5-fold runs are available)

## How NOT to use it

- ❌ Not validated for clinical diagnosis or treatment decisions
- ❌ Not validated on diagnostic-CT cases (different intensity distributions, larger lesions) β€” see Limitations
- ❌ Single fold, not an ensemble β€” paper-grade results require all 5 folds
- ❌ Multi-lesion identity is collapsed in training labels; if your downstream task needs per-lesion instances, this checkpoint won't recover them directly

## How to use

### 1. Download the checkpoint and metadata

```python
from huggingface_hub import snapshot_download
local_dir = snapshot_download(repo_id="Lab-Rasool/CLN-Segmenter-NLSTseg-fold0")
print("Files at:", local_dir)
```

### 2. Set up an nnU-Net inference directory

nnU-Net expects a specific directory structure for results:

```
nnUNet_results/
└── Dataset503_NLSTseg/
    └── nnUNetTrainer__nnUNetPlans__3d_fullres/
        β”œβ”€β”€ dataset.json
        β”œβ”€β”€ plans.json                    (rename from nnUNetPlans.json)
        β”œβ”€β”€ dataset_fingerprint.json
        └── fold_0/
            β”œβ”€β”€ checkpoint_best.pth
            └── splits_final.json
```

You can build this with:

```bash
DST=/path/to/nnUNet_results/Dataset503_NLSTseg/nnUNetTrainer__nnUNetPlans__3d_fullres
mkdir -p $DST/fold_0
cp $local_dir/dataset.json              $DST/dataset.json
cp $local_dir/nnUNetPlans.json          $DST/plans.json
cp $local_dir/dataset_fingerprint.json  $DST/dataset_fingerprint.json
cp $local_dir/checkpoint_best.pth       $DST/fold_0/checkpoint_best.pth
cp $local_dir/splits_final.json         $DST/fold_0/splits_final.json
```

### 3. Run inference with nnU-Net

```bash
export nnUNet_results=/path/to/nnUNet_results
nnUNetv2_predict \
    -i /path/to/your/input_images \
    -o /path/to/output_predictions \
    -d 503 \
    -c 3d_fullres \
    -tr nnUNetTrainer \
    -p nnUNetPlans \
    -f 0 \
    -chk checkpoint_best.pth
```

Input images should be CT volumes named with the nnU-Net channel suffix: `<case_id>_0000.nii.gz`.

## Training procedure

- **Framework**: nnU-Net v2.7.0 (default trainer)
- **Preprocessing**: CT-specific normalization (HU clipping at the 0.5/99.5 percentiles of foreground voxels, then per-case z-score), resampling to target spacing `[1.25, 0.664, 0.664]` mm
- **Augmentation**: nnU-Net's default 3D augmentation pipeline (rotation, scaling, gamma, mirroring, gaussian noise/blur, low-resolution simulation)
- **Optimization**: SGD + Nesterov momentum (Ξ²=0.99), polynomial LR decay (initial LR 0.01)
- **Iterations**: fixed 250 per epoch (nnU-Net default; independent of dataset size)
- **Best-checkpoint mechanism**: nnU-Net automatically tracks EMA of validation Pseudo Dice and saves `checkpoint_best.pth` at the peak

## Evaluation

Two complementary Dice metrics, both honest, computed on the 121 fold-0 validation cases:

| Metric | Value | What it measures |
|--------|-------|------------------|
| **Mean Validation Dice** (per-case, sliding-window) | **0.6123** | Per-case Dice from full-volume `nnUNetv2_predict` inference, averaged across 121 val cases. **Case-weighted** β€” every scan counts equally regardless of tumor size. *This is the metric most papers report.* |
| **Best EMA Pseudo Dice** (in-training) | 0.7663 | Voxel-pooled Dice across validation patches during training. **Voxel-weighted** β€” large lesions dominate. Used by nnU-Net to select `checkpoint_best.pth`. |
| Pseudo Dice raw (jagged) range | 0.45–0.85 | (peak per-epoch readings during training) |
| Train/val loss gap (final epoch) | ~0 | No measurable overfitting throughout. |

The **0.15 gap** between Pseudo Dice (0.7663) and Mean Validation Dice (0.6123) is wider than the gap on uniform-tumor datasets like MSD Task06 (~0.10 gap there). NLSTseg has lesion volumes spanning 0.03 β†’ 372 cmΒ³ (median 1.37 cmΒ³, long-tailed), so voxel-pooled Dice is dominated by the few large lesions while per-case Dice gives equal weight to many small-lesion cases that are individually harder. The voxel-pool vs case-average disagreement reflects this distribution honestly.

The training plot (`progress.png`) shows:

1. **Smooth Pseudo Dice climb** from 0 β†’ 0.55 in the first ~50 epochs, then 0.55 β†’ 0.77 over epochs 50–870. Slow continuous improvement throughout, with diminishing returns past epoch ~600.
2. **Train/val loss curves overlap nearly perfectly** end-to-end. With 483 training cases (10Γ— MSD-only's 50), the model has enough data variety that it cannot memorize specifics. This translates into clean generalization β€” no overfitting to manage.

For comparisons against other methods, **cite the Mean Validation Dice (0.6123)**. Pseudo Dice is useful as an in-training monitoring signal but not for cross-method comparison.

Per-case validation results are available in `validation_summary.json` (Dice, IoU, TP/FP/FN counts per case).

The 0.6123 figure reflects the difficulty of small-lesion segmentation in heterogeneous, multi-institutional LDCT. It is the model's honest performance on its native validation distribution.

## Why this checkpoint matters

This is the **clean-generalization complement** to the MSD-only fold-0 checkpoint (`Lab-Rasool/CLN-Segmenter-MSD-fold0`). MSD shows what nnU-Net default does on a small (50 train / 13 val) single-institution diagnostic-CT corpus with large tumors β†’ high Pseudo Dice (0.82) but with mild late-stage overfitting. NLSTseg shows the opposite end: ~10Γ— more data (483 train / 121 val), multi-institutional LDCT, smaller lesions β†’ lower raw Dice (0.77) but no overfitting.

For Stage 2 finetuning on a target domain, this checkpoint is the right choice when the target is screening / LDCT / multi-institutional / small-lesion. For diagnostic-CT-heavy targets, the MSD checkpoint or the unified `Dataset500_LungLesions` pretrain (when available) is the better starting point.

## Limitations

- **Single fold of 5-fold CV** β€” not an ensemble. Published-grade numbers require all 5 folds either averaged or ensembled at inference.
- **Trained on LDCT only** β€” performance on diagnostic CT is unknown and likely lower without finetuning (different HU distributions, less noise).
- **Small lesions dominate the training distribution** β€” performance on large primary tumors (e.g., >5 cmΒ³) is not optimized for.
- **Multi-label β†’ binary collapse**: per-lesion identity and tumor-vs-nodule distinction are lost in this checkpoint's outputs.
- **One source case excluded** (`nlst_0393` / patient 205714) due to source-data shape mismatch. Not a model issue, but worth knowing if you reproduce.
- **No clinical validation** β€” this is a research artifact, not a medical device.

## License

**CC-BY 4.0**, inherited from the NLSTseg source dataset license.

## Citation

If you use this model, please cite:

```bibtex
@article{isensee2021nnunet,
  title   = {nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation},
  author  = {Isensee, Fabian and Jaeger, Paul F and Kohl, Simon A A and Petersen, Jens and Maier-Hein, Klaus H},
  journal = {Nature Methods},
  volume  = {18},
  number  = {2},
  pages   = {203--211},
  year    = {2021}
}

@article{chen2025nlstseg,
  title   = {NLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT Images},
  author  = {Chen, et al.},
  journal = {Scientific Data},
  year    = {2025},
  doi     = {10.1038/s41597-025-05742-x}
}

@article{nlst2011,
  title   = {Reduced lung-cancer mortality with low-dose computed tomographic screening},
  author  = {{The National Lung Screening Trial Research Team}},
  journal = {New England Journal of Medicine},
  year    = {2011},
  doi     = {10.1056/NEJMoa1102873}
}
```

## Project context

Part of **CLN-Segmenter** at the Rasool Lab, Moffitt Cancer Center: a two-stage approach for lung lesion segmentation that pretrains on public datasets (this is one component) and finetunes on internal data with domain-specific loss formulations.

- **Code**: https://github.com/lab-rasool/CLN-Segmenter
- **Lab**: https://huggingface.co/Lab-Rasool

Other models in this series:
- `Lab-Rasool/CLN-Segmenter-MSD-fold0` β€” single-dataset MSD Task06 POC (diagnostic CT, 63 expert cases, Dice 0.82)
- `Lab-Rasool/CLN-Segmenter-Dataset500-fold0` β€” unified MSD + NLSTseg pretrain (planned)