Add Mean Validation Dice 0.7161 + validation_summary.json (per-case results from checkpoint_best)

150ec31 verified 3 days ago

9.52 kB

	---
	license: cc-by-sa-4.0
	tags:
	- nnunet
	- nnunetv2
	- medical-imaging
	- segmentation
	- 3d-segmentation
	- ct
	- lung
	- lung-cancer
	- tumor-segmentation
	library_name: nnunetv2
	pipeline_tag: image-segmentation
	datasets:
	- MSD-Task06-Lung
	language:
	- en
	---

	# CLN-Segmenter — MSD Task06 Lung Tumor Segmentation (fold 0)

	A 3D U-Net (nnU-Net v2 `3d_fullres`) trained on the Medical Segmentation Decathlon Task06: Lung Tumor dataset, fold 0 of 5-fold cross-validation. Released as part of the CLN-Segmenter project at the Rasool Lab, Moffitt Cancer Center.

	This is a single-fold pretrain checkpoint, intended as a starting point for downstream lung-lesion segmentation work — not a clinical-grade tool.

	## Quick stats

	\| \| \|
	\|--\|--\|
	\| Architecture \| nnU-Net v2 `3d_fullres` (PlainConvUNet, 6 stages, features `[32, 64, 128, 256, 320, 320]`) \|
	\| Training data \| MSD Task06 Lung — 63 cases (50 train / 13 val for fold 0) \|
	\| Loss \| Dice + Cross-Entropy (nnU-Net default), `batch_dice=True` \|
	\| Schedule \| 1000 epochs, polynomial LR decay 0.01 → 0, batch size 2, patch `[80, 192, 160]` \|
	\| Hardware \| 1× NVIDIA H100 80GB, ~6h wall-time \|
	\| Mean Validation Dice (per-case, sliding-window) \| 0.7161 \|
	\| Best EMA Pseudo Dice (in-training proxy) \| 0.8155 (epoch ~755) \|
	\| Foreground IoU (per-case avg) \| ~0.59 (from `validation_summary.json`) \|
	\| Comparison \| Within published nnU-Net Task06 range (0.69–0.78 across various reports) \|

	## Files in this repo

	\| File \| Role \|
	\|------\|------\|
	\| `checkpoint_best.pth` \| Model weights — saved at the EMA Pseudo Dice peak (~epoch 755), before the late-epoch overfitting plateau \|
	\| `nnUNetPlans.json` \| Architecture spec + preprocessing plans. Required for inference. \|
	\| `dataset.json` \| Channel names, label names, file ending (nnU-Net v2 schema). Required for inference. \|
	\| `dataset_fingerprint.json` \| HU intensity stats from training data \|
	\| `splits_final.json` \| Train/val case ID splits for fold 0 (reproducibility) \|
	\| `progress.png` \| Training curves: loss, Pseudo Dice, epoch duration, learning rate \|

	## Training data and provenance

	This model was trained only on the publicly available MSD Task06 Lung dataset (Antonelli et al. 2022, Nature Communications, CC-BY-SA 4.0). It contains expert pixel-level lung tumor annotations from 63 diagnostic CT scans.

	No patient-identifiable or institutional data was used. This checkpoint contains no information derived from any non-public source.

	## Intended use

	- Pretrained starting point for finetuning on related lung-lesion segmentation tasks (smaller datasets, domain shift, etc.)
	- Reference baseline for published Task06 numbers
	- Input to ensembling with other folds (when 5-fold runs are available)

	## How NOT to use it

	- ❌ Not validated for clinical diagnosis or treatment decisions
	- ❌ Not validated on low-dose screening CT (LDCT) — see Limitations
	- ❌ Single fold, not an ensemble — paper-grade results require all 5 folds
	- ❌ Not validated outside the MSD Task06 case distribution

	## How to use

	### 1. Download the checkpoint and metadata

	```python
	from huggingface_hub import snapshot_download
	local_dir = snapshot_download(repo_id="Lab-Rasool/CLN-Segmenter-MSD-fold0")
	print("Files at:", local_dir)
	```

	### 2. Set up an nnU-Net inference directory

	nnU-Net expects a specific directory structure for results:

	```
	nnUNet_results/
	└── Dataset502_MSDLung/
	└── nnUNetTrainer__nnUNetPlans__3d_fullres/
	├── dataset.json
	├── plans.json (rename from nnUNetPlans.json)
	├── dataset_fingerprint.json
	└── fold_0/
	├── checkpoint_best.pth
	└── splits_final.json
	```

	You can build this with:

	```bash
	DST=/path/to/nnUNet_results/Dataset502_MSDLung/nnUNetTrainer__nnUNetPlans__3d_fullres
	mkdir -p $DST/fold_0
	cp $local_dir/dataset.json $DST/dataset.json
	cp $local_dir/nnUNetPlans.json $DST/plans.json
	cp $local_dir/dataset_fingerprint.json $DST/dataset_fingerprint.json
	cp $local_dir/checkpoint_best.pth $DST/fold_0/checkpoint_best.pth
	cp $local_dir/splits_final.json $DST/fold_0/splits_final.json
	```

	### 3. Run inference with nnU-Net

	```bash
	export nnUNet_results=/path/to/nnUNet_results
	nnUNetv2_predict \
	-i /path/to/your/input_images \
	-o /path/to/output_predictions \
	-d 502 \
	-c 3d_fullres \
	-tr nnUNetTrainer \
	-p nnUNetPlans \
	-f 0 \
	-chk checkpoint_best.pth
	```

	Input images should be CT volumes named with the nnU-Net channel suffix: `<case_id>_0000.nii.gz`.

	## Training procedure

	- Framework: nnU-Net v2.7.0 (default trainer)
	- Preprocessing: CT-specific normalization (HU clipping at the 0.5/99.5 percentiles of foreground voxels, then per-case z-score), resampling to target spacing `[1.245, 0.785, 0.785]` mm
	- Augmentation: nnU-Net's default 3D augmentation pipeline (rotation, scaling, gamma, mirroring, gaussian noise/blur, low-resolution simulation)
	- Optimization: SGD + Nesterov momentum (β=0.99), polynomial LR decay (initial LR 0.01)
	- Iterations: fixed 250 per epoch (nnU-Net default; independent of dataset size)
	- Best-checkpoint mechanism: nnU-Net automatically tracks EMA of validation Pseudo Dice and saves `checkpoint_best.pth` at the peak

	## Evaluation

	Two complementary Dice metrics, both honest, computed on the 13 fold-0 validation cases:

	\| Metric \| Value \| What it measures \|
	\|--------\|-------\|------------------\|
	\| Mean Validation Dice (per-case, sliding-window) \| 0.7161 \| Per-case Dice from full-volume `nnUNetv2_predict` inference on each of the 13 val cases, averaged. Case-weighted — every scan counts equally regardless of tumor size. This is the metric most papers report. \|
	\| Best EMA Pseudo Dice (in-training) \| 0.8155 \| Voxel-pooled Dice across validation patches during training. Voxel-weighted — large tumors dominate. Used by nnU-Net to select `checkpoint_best.pth`. \|
	\| Pseudo Dice raw (jagged) range \| 0.50–0.85 \| (peak per-epoch readings during training) \|
	\| Final-epoch train loss \| -0.85 \| Mild late-stage overfitting visible in `progress.png`. \|
	\| Final-epoch val loss \| -0.75 \| `checkpoint_best.pth` predates this. \|

	The 0.10 gap between Pseudo Dice (0.8155) and Mean Validation Dice (0.7161) is smaller than for varied-lesion-size datasets like NLSTseg or Dataset500 (~0.15 gap there). MSD Task06's tumors are uniformly large (median volume 5.22 cm³), so voxel-pooled and per-case Dice are reasonably close. The smaller a dataset's lesions and the wider the size distribution, the bigger the Pseudo–Mean gap.

	The training plot (`progress.png`) shows a smooth Pseudo Dice climb from 0 → 0.7 in the first ~50 epochs and slow refinement to 0.81 by epoch ~750, then mild overfitting (train loss continues to drop, val loss plateaus). nnU-Net's best-checkpoint mechanism preserves the pre-overfit weights — that's the model in this repo.

	For comparisons against other methods, cite the Mean Validation Dice (0.7161). Pseudo Dice is useful as an in-training monitoring signal but not for cross-method comparison.

	Per-case validation results are available in `validation_summary.json` (Dice, IoU, TP/FP/FN counts per case).

	## Limitations

	- Single fold of 5-fold CV — not an ensemble. Published-grade numbers require all 5 folds either averaged or ensembled at inference.
	- Trained on diagnostic CT only — performance on low-dose screening CT (LDCT) is unknown and likely lower without finetuning.
	- Small training set — 50 cases. The model showed mild late-stage overfitting consistent with this scale; the best-checkpoint is from before that point but generalization is bounded by data size.
	- MSD Task06 case distribution — annotations focus on primary lung tumors (median volume ~5.2 cm³). Performance on small nodules (e.g. <5mm) or non-tumor lung lesions is not characterized.
	- No clinical validation — this is a research artifact, not a medical device.

	## License

	CC-BY-SA 4.0, inherited from the share-alike clause of the MSD Task06 source dataset license.

	## Citation

	If you use this model, please cite:

	```bibtex
	@article{isensee2021nnunet,
	title = {nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation},
	author = {Isensee, Fabian and Jaeger, Paul F and Kohl, Simon A A and Petersen, Jens and Maier-Hein, Klaus H},
	journal = {Nature Methods},
	volume = {18},
	number = {2},
	pages = {203--211},
	year = {2021}
	}

	@article{antonelli2022medical,
	title = {The Medical Segmentation Decathlon},
	author = {Antonelli, Michela and Reinke, Annika and Bakas, Spyridon and others},
	journal = {Nature Communications},
	volume = {13},
	number = {1},
	pages = {4128},
	year = {2022}
	}
	```

	## Project context

	Part of CLN-Segmenter at the Rasool Lab, Moffitt Cancer Center: a two-stage approach for lung lesion segmentation that pretrains on public datasets (this is one component) and finetunes on internal data with domain-specific loss formulations.

	- Code: https://github.com/lab-rasool/CLN-Segmenter
	- Lab: https://huggingface.co/Lab-Rasool

	Other models in this series:
	- `Lab-Rasool/CLN-Segmenter-NLSTseg-fold0` — single-dataset NLSTseg POC (LDCT, 605 expert cases)
	- `Lab-Rasool/CLN-Segmenter-Dataset500-fold0` — unified MSD + NLSTseg pretrain (planned)