README.md · Lab-Rasool/CLN-Segmenter-Dataset500-fold0 at main

CLN-Segmenter-Dataset500-fold0 / README.md

maazshahbaz

Initial upload: nnU-Net Dataset500 unified MSD+NLSTseg fold 0 (Pseudo Dice 0.7658, Mean Validation Dice 0.6172)

5459402 verified 4 days ago

preview code

raw

history blame contribute delete

12.3 kB

	---
	license: cc-by-sa-4.0
	tags:
	- nnunet
	- nnunetv2
	- medical-imaging
	- segmentation
	- 3d-segmentation
	- ct
	- ldct
	- low-dose-ct
	- diagnostic-ct
	- lung
	- lung-cancer
	- tumor-segmentation
	- multi-institutional
	- pretrained
	library_name: nnunetv2
	pipeline_tag: image-segmentation
	datasets:
	- MSD-Task06-Lung
	- NLSTseg
	language:
	- en
	---

	# CLN-Segmenter — Dataset500 Unified Lung Lesion Pretrain (fold 0)

	A 3D U-Net (nnU-Net v2 `3d_fullres`) trained on Dataset500_LungLesions, the unified Stage 1 pretraining corpus combining MSD Task06 (diagnostic CT) and NLSTseg (low-dose screening CT) — 667 expert-annotated cases. Fold 0 of 5-fold cross-validation. Released as part of the CLN-Segmenter project at the Rasool Lab, Moffitt Cancer Center.

	This is the v1 unified pretrain intended as a starting point for downstream lung-lesion finetuning, especially when the target combines diagnostic and screening CT.

	## Quick stats

	\| \| \|
	\|--\|--\|
	\| Architecture \| nnU-Net v2 `3d_fullres` (PlainConvUNet, 6 stages, features `[32, 64, 128, 256, 320, 320]`) \|
	\| Training data \| Dataset500_LungLesions — 667 cases (533 train / 134 val for fold 0) \|
	\| Composition \| 63 MSD Task06 (diagnostic CT, 9%) + 604 NLSTseg (LDCT, 91%) \|
	\| Loss \| Dice + Cross-Entropy (nnU-Net default), `batch_dice=True` \|
	\| Schedule \| 1000 epochs, polynomial LR decay 0.01 → 0, batch size 2, patch `[80, 192, 160]` \|
	\| Hardware \| 1× NVIDIA H100 80GB, ~6h 41m wall-time \|
	\| Best EMA Pseudo Dice (in-training) \| 0.7658 (epoch ~960) \|
	\| Mean Validation Dice (per-case, sliding-window) \| 0.6172 \|
	\| Foreground IoU \| 0.5121 \|
	\| Generalization \| No measurable overfitting — train/val loss curves overlap throughout \|

	## ⚠️ Two metrics, both honest — read this section

	The two Dice numbers reported above are computed differently and disagree by ~0.15. Both are correct; they answer different questions:

	### `Best EMA Pseudo Dice = 0.7658` (in-training, voxel-pooled)
	Computed by nnU-Net every epoch on patches sampled from validation cases. Pools True Positives, False Positives, False Negatives across all val patches into one Dice. Voxel-weighted: large lesions dominate. This is the metric nnU-Net uses to select `checkpoint_best.pth`.

	### `Mean Validation Dice = 0.6172` (sliding-window, per-case averaged)
	Computed after training by running full-volume sliding-window inference on each of the 134 fold-0 validation cases, computing per-case Dice, then averaging. Case-weighted: each scan counts equally regardless of tumor size. This is the metric most papers report.

	### Why the gap is large for this dataset

	NLSTseg (91% of cases) has a wide range of lesion sizes (median 1.37 cm³, but the per-lesion volume distribution spans 0.03 to 372 cm³ in the source). MSD's tumors (9% of cases) are uniformly larger (median 5.22 cm³).

	- Pseudo Dice is dominated by the big-tumor voxel mass → looks high (0.77).
	- Mean Validation Dice treats a tiny 4 mm nodule with Dice 0.30 the same as a large tumor with Dice 0.85 → drops the average toward the harder small-lesion cases (0.62).

	For comparison: case_0001 (MSD) achieves per-case Dice 0.892 in this fold's validation. Several small-lesion NLSTseg cases score below 0.40. The 0.6172 average reflects that distribution faithfully.

	### Which one should you cite?

	- For papers and external comparisons: cite 0.6172 Mean Validation Dice (per-case).
	- For comparisons against nnU-Net's training-time logs of other people's runs: cite 0.7658 Pseudo Dice.
	- For full-pipeline performance: also report a 5-fold ensemble Mean Dice (~+3-5% above single-fold typically) once all 5 folds are trained.

	## Files in this repo

	\| File \| Role \|
	\|------\|------\|
	\| `checkpoint_best.pth` \| Model weights — saved at the EMA Pseudo Dice peak (~epoch 960) \|
	\| `nnUNetPlans.json` \| Architecture spec + preprocessing plans. Required for inference. \|
	\| `dataset.json` \| Channel names, label names, file ending (nnU-Net v2 schema). Required for inference. \|
	\| `dataset_fingerprint.json` \| HU intensity stats from training data \|
	\| `splits_final.json` \| Train/val case ID splits for fold 0 (reproducibility) \|
	\| `progress.png` \| Training curves: loss, Pseudo Dice, epoch duration, learning rate \|
	\| `validation_summary.json` \| Per-case validation Dice/IoU/TP/FP/FN for all 134 fold-0 validation cases \|

	## Training data and provenance

	This model was trained only on publicly available datasets:

	- MSD Task06 Lung (Antonelli et al. 2022, Nature Communications, CC-BY-SA 4.0) — 63 expert tumor masks on diagnostic CT
	- NLSTseg (Chen et al. 2025, Scientific Data, CC-BY 4.0) — 604 expert pixel-level masks on low-dose screening CT (1 patient excluded — `nlst_0393` / patient 205714 — due to a CT/mask shape mismatch in the source files)

	The two source datasets were unified via [`build_unified_dataset.py`](https://github.com/lab-rasool/CLN-Segmenter/blob/main/data_prep/build_unified_dataset.py): images copied verbatim, NLSTseg multi-label masks binarized via `(mask > 0).astype(uint8)`, sequential renumbering as `case_0001` … `case_0667` (MSD first, then NLSTseg). Full mapping in the dataset repo's `id_mapping.csv`.

	LUNA16 was intentionally excluded. Its sphere-mask conversion from `(centroid, diameter)` annotations produced semantically incoherent foreground (HU spans lung air → soft tissue → bone) and the standalone Dataset501_LUNA16 run trained 1000 epochs at Pseudo Dice 0. Re-evaluating with LIDC-IDRI consensus masks is a candidate for v2.

	No patient-identifiable or institutional data was used. This checkpoint contains no information derived from any non-public source.

	## Foreground intensity profile (training-data fingerprint)

	The unified dataset's CT HU statistics inside foreground (lesion) voxels:

	\| Stat \| Value \|
	\|--\|--\|
	\| mean \| -197 HU \|
	\| median \| -134 HU \|
	\| std \| 259 \|
	\| 0.5%-ile \| -926 \|
	\| 99.5%-ile \| 252 \|

	The distribution is dominated by NLSTseg (91% of cases) with a slight pull from MSD's heavier tails. Mean and median sit cleanly in soft-tissue-adjacent territory; the 99.5%-ile stays away from bone/implant ranges. This is a coherent foreground class for default Dice+CE — and the training curves confirm it.

	## Intended use

	- Pretrained starting point for finetuning on related lung-lesion segmentation tasks (especially mixed-modality or institutional-shift settings)
	- Reference for unified multi-source pretraining with default nnU-Net v2 settings
	- Input to ensembling with other folds (when 5-fold runs are available)

	## How NOT to use it

	- ❌ Not validated for clinical diagnosis or treatment decisions
	- ❌ Single fold, not an ensemble — paper-grade results require all 5 folds
	- ❌ Distribution-shift expectations: predominantly LDCT (91%); transfer to a pure diagnostic-CT target may be helped further by finetuning, or by using `Lab-Rasool/CLN-Segmenter-MSD-fold0` as the starting point instead

	## How to use

	### 1. Download the checkpoint and metadata

	```python
	from huggingface_hub import snapshot_download
	local_dir = snapshot_download(repo_id="Lab-Rasool/CLN-Segmenter-Dataset500-fold0")
	print("Files at:", local_dir)
	```

	### 2. Set up an nnU-Net inference directory

	```
	nnUNet_results/
	└── Dataset500_LungLesions/
	└── nnUNetTrainer__nnUNetPlans__3d_fullres/
	├── dataset.json
	├── plans.json (rename from nnUNetPlans.json)
	├── dataset_fingerprint.json
	└── fold_0/
	├── checkpoint_best.pth
	└── splits_final.json
	```

	```bash
	DST=/path/to/nnUNet_results/Dataset500_LungLesions/nnUNetTrainer__nnUNetPlans__3d_fullres
	mkdir -p $DST/fold_0
	cp $local_dir/dataset.json $DST/dataset.json
	cp $local_dir/nnUNetPlans.json $DST/plans.json
	cp $local_dir/dataset_fingerprint.json $DST/dataset_fingerprint.json
	cp $local_dir/checkpoint_best.pth $DST/fold_0/checkpoint_best.pth
	cp $local_dir/splits_final.json $DST/fold_0/splits_final.json
	```

	### 3. Run inference with nnU-Net

	```bash
	export nnUNet_results=/path/to/nnUNet_results
	nnUNetv2_predict \
	-i /path/to/your/input_images \
	-o /path/to/output_predictions \
	-d 500 \
	-c 3d_fullres \
	-tr nnUNetTrainer \
	-p nnUNetPlans \
	-f 0 \
	-chk checkpoint_best.pth
	```

	Input images should be CT volumes named with the nnU-Net channel suffix: `<case_id>_0000.nii.gz`.

	## Training procedure

	- Framework: nnU-Net v2.7.0 (default trainer)
	- Preprocessing: CT-specific normalization (HU clipping at the 0.5/99.5 percentiles of foreground voxels, then per-case z-score), resampling to target spacing `[1.245, 0.664, 0.664]` mm
	- Augmentation: nnU-Net's default 3D augmentation pipeline (rotation, scaling, gamma, mirroring, gaussian noise/blur, low-resolution simulation)
	- Optimization: SGD + Nesterov momentum (β=0.99), polynomial LR decay (initial LR 0.01 → 0)
	- Iterations: fixed 250 per epoch (nnU-Net default; independent of dataset size)
	- Best-checkpoint mechanism: nnU-Net automatically tracks EMA of validation Pseudo Dice and saves `checkpoint_best.pth` at the peak

	## Domain composition note

	The training corpus is 9% diagnostic CT (MSD) and 91% LDCT (NLSTseg). nnU-Net does not explicitly rebalance per-source sampling — the model sees patches in proportion to case count. With ~500K total patches over 1000 epochs × 250 iterations × batch 2, that translates to ~45,000 MSD patches and ~455,000 NLSTseg patches.

	Empirically the model handles both modalities (`case_0001` MSD scores Dice 0.89 in fold-0 validation), but the underlying representation skews LDCT. Stage 1 v2 will rebalance by adding more diagnostic-CT data (LIDC-IDRI consensus, NSCLC-Radiomics) rather than re-weighting existing samples.

	## Limitations

	- Single fold of 5-fold CV — not an ensemble. Paper-grade results require all 5 folds either averaged or ensembled at inference.
	- Domain imbalance — 91% LDCT may underperform without finetuning on a pure diagnostic-CT target (consider `Lab-Rasool/CLN-Segmenter-MSD-fold0` for that case).
	- Small-lesion performance — per-case Dice for tiny nodules (<5mm) is noticeably worse than for larger tumors; the 0.6172 mean reflects the full distribution including these hard cases.
	- One source case excluded (`nlst_0393` / patient 205714) due to source-data shape mismatch.
	- No clinical validation — this is a research artifact, not a medical device.

	## License

	CC-BY-SA 4.0, inherited from the share-alike clause of the MSD Task06 source dataset license.

	## Citation

	If you use this model, please cite all three works:

	```bibtex
	@article{isensee2021nnunet,
	title = {nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation},
	author = {Isensee, Fabian and Jaeger, Paul F and Kohl, Simon A A and Petersen, Jens and Maier-Hein, Klaus H},
	journal = {Nature Methods},
	volume = {18},
	number = {2},
	pages = {203--211},
	year = {2021}
	}

	@article{antonelli2022medical,
	title = {The Medical Segmentation Decathlon},
	author = {Antonelli, Michela and Reinke, Annika and Bakas, Spyridon and others},
	journal = {Nature Communications},
	volume = {13},
	number = {1},
	pages = {4128},
	year = {2022}
	}

	@article{chen2025nlstseg,
	title = {NLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT Images},
	author = {Chen, et al.},
	journal = {Scientific Data},
	year = {2025},
	doi = {10.1038/s41597-025-05742-x}
	}
	```

	## Project context

	Part of CLN-Segmenter at the Rasool Lab, Moffitt Cancer Center: a two-stage approach for lung lesion segmentation that pretrains on public datasets (this is the v1 unified pretrain) and finetunes on internal data with domain-specific loss formulations.

	- Code: https://github.com/lab-rasool/CLN-Segmenter
	- Lab: https://huggingface.co/Lab-Rasool

	Other models in this series:
	- `Lab-Rasool/CLN-Segmenter-MSD-fold0` — MSD-only POC (diagnostic CT, 63 cases, Pseudo Dice 0.82)
	- `Lab-Rasool/CLN-Segmenter-NLSTseg-fold0` — NLSTseg-only POC (LDCT, 604 cases, Pseudo Dice 0.77)