Initial upload: nnU-Net Dataset500 unified MSD+NLSTseg fold 0 (Pseudo Dice 0.7658, Mean Validation Dice 0.6172)
5459402 verified | license: cc-by-sa-4.0 | |
| tags: | |
| - nnunet | |
| - nnunetv2 | |
| - medical-imaging | |
| - segmentation | |
| - 3d-segmentation | |
| - ct | |
| - ldct | |
| - low-dose-ct | |
| - diagnostic-ct | |
| - lung | |
| - lung-cancer | |
| - tumor-segmentation | |
| - multi-institutional | |
| - pretrained | |
| library_name: nnunetv2 | |
| pipeline_tag: image-segmentation | |
| datasets: | |
| - MSD-Task06-Lung | |
| - NLSTseg | |
| language: | |
| - en | |
| # CLN-Segmenter β Dataset500 Unified Lung Lesion Pretrain (fold 0) | |
| A 3D U-Net (nnU-Net v2 `3d_fullres`) trained on **Dataset500_LungLesions**, the unified Stage 1 pretraining corpus combining MSD Task06 (diagnostic CT) and NLSTseg (low-dose screening CT) β 667 expert-annotated cases. Fold 0 of 5-fold cross-validation. Released as part of the CLN-Segmenter project at the Rasool Lab, Moffitt Cancer Center. | |
| This is the **v1 unified pretrain** intended as a starting point for downstream lung-lesion finetuning, especially when the target combines diagnostic and screening CT. | |
| ## Quick stats | |
| | | | | |
| |--|--| | |
| | **Architecture** | nnU-Net v2 `3d_fullres` (PlainConvUNet, 6 stages, features `[32, 64, 128, 256, 320, 320]`) | | |
| | **Training data** | Dataset500_LungLesions β 667 cases (533 train / 134 val for fold 0) | | |
| | **Composition** | 63 MSD Task06 (diagnostic CT, 9%) + 604 NLSTseg (LDCT, 91%) | | |
| | **Loss** | Dice + Cross-Entropy (nnU-Net default), `batch_dice=True` | | |
| | **Schedule** | 1000 epochs, polynomial LR decay 0.01 β 0, batch size 2, patch `[80, 192, 160]` | | |
| | **Hardware** | 1Γ NVIDIA H100 80GB, ~6h 41m wall-time | | |
| | **Best EMA Pseudo Dice** (in-training) | **0.7658** (epoch ~960) | | |
| | **Mean Validation Dice** (per-case, sliding-window) | **0.6172** | | |
| | **Foreground IoU** | **0.5121** | | |
| | **Generalization** | No measurable overfitting β train/val loss curves overlap throughout | | |
| ## β οΈ Two metrics, both honest β read this section | |
| The two Dice numbers reported above are computed differently and **disagree by ~0.15**. Both are correct; they answer different questions: | |
| ### `Best EMA Pseudo Dice = 0.7658` (in-training, voxel-pooled) | |
| Computed by nnU-Net every epoch on patches sampled from validation cases. Pools True Positives, False Positives, False Negatives across all val patches into one Dice. **Voxel-weighted**: large lesions dominate. This is the metric nnU-Net uses to select `checkpoint_best.pth`. | |
| ### `Mean Validation Dice = 0.6172` (sliding-window, per-case averaged) | |
| Computed *after* training by running full-volume sliding-window inference on each of the 134 fold-0 validation cases, computing per-case Dice, then averaging. **Case-weighted**: each scan counts equally regardless of tumor size. **This is the metric most papers report.** | |
| ### Why the gap is large for *this* dataset | |
| NLSTseg (91% of cases) has a wide range of lesion sizes (median 1.37 cmΒ³, but the per-lesion volume distribution spans 0.03 to 372 cmΒ³ in the source). MSD's tumors (9% of cases) are uniformly larger (median 5.22 cmΒ³). | |
| - **Pseudo Dice** is dominated by the big-tumor voxel mass β looks high (0.77). | |
| - **Mean Validation Dice** treats a tiny 4 mm nodule with Dice 0.30 the same as a large tumor with Dice 0.85 β drops the average toward the harder small-lesion cases (0.62). | |
| For comparison: case_0001 (MSD) achieves per-case Dice **0.892** in this fold's validation. Several small-lesion NLSTseg cases score below 0.40. The 0.6172 average reflects that distribution faithfully. | |
| ### Which one should you cite? | |
| - **For papers and external comparisons**: cite **0.6172 Mean Validation Dice** (per-case). | |
| - **For comparisons against nnU-Net's training-time logs of other people's runs**: cite **0.7658 Pseudo Dice**. | |
| - For full-pipeline performance: also report a 5-fold ensemble Mean Dice (~+3-5% above single-fold typically) once all 5 folds are trained. | |
| ## Files in this repo | |
| | File | Role | | |
| |------|------| | |
| | `checkpoint_best.pth` | Model weights β saved at the EMA Pseudo Dice peak (~epoch 960) | | |
| | `nnUNetPlans.json` | Architecture spec + preprocessing plans. **Required** for inference. | | |
| | `dataset.json` | Channel names, label names, file ending (nnU-Net v2 schema). **Required** for inference. | | |
| | `dataset_fingerprint.json` | HU intensity stats from training data | | |
| | `splits_final.json` | Train/val case ID splits for fold 0 (reproducibility) | | |
| | `progress.png` | Training curves: loss, Pseudo Dice, epoch duration, learning rate | | |
| | `validation_summary.json` | Per-case validation Dice/IoU/TP/FP/FN for all 134 fold-0 validation cases | | |
| ## Training data and provenance | |
| This model was trained **only on publicly available datasets**: | |
| - **MSD Task06 Lung** (Antonelli et al. 2022, *Nature Communications*, CC-BY-SA 4.0) β 63 expert tumor masks on diagnostic CT | |
| - **NLSTseg** (Chen et al. 2025, *Scientific Data*, CC-BY 4.0) β 604 expert pixel-level masks on low-dose screening CT (1 patient excluded β `nlst_0393` / patient 205714 β due to a CT/mask shape mismatch in the source files) | |
| The two source datasets were unified via [`build_unified_dataset.py`](https://github.com/lab-rasool/CLN-Segmenter/blob/main/data_prep/build_unified_dataset.py): images copied verbatim, NLSTseg multi-label masks binarized via `(mask > 0).astype(uint8)`, sequential renumbering as `case_0001` β¦ `case_0667` (MSD first, then NLSTseg). Full mapping in the dataset repo's `id_mapping.csv`. | |
| **LUNA16 was intentionally excluded.** Its sphere-mask conversion from `(centroid, diameter)` annotations produced semantically incoherent foreground (HU spans lung air β soft tissue β bone) and the standalone Dataset501_LUNA16 run trained 1000 epochs at Pseudo Dice 0. Re-evaluating with LIDC-IDRI consensus masks is a candidate for v2. | |
| **No patient-identifiable or institutional data was used.** This checkpoint contains no information derived from any non-public source. | |
| ## Foreground intensity profile (training-data fingerprint) | |
| The unified dataset's CT HU statistics inside foreground (lesion) voxels: | |
| | Stat | Value | | |
| |--|--| | |
| | mean | -197 HU | | |
| | median | -134 HU | | |
| | std | 259 | | |
| | 0.5%-ile | -926 | | |
| | 99.5%-ile | 252 | | |
| The distribution is dominated by NLSTseg (91% of cases) with a slight pull from MSD's heavier tails. Mean and median sit cleanly in soft-tissue-adjacent territory; the 99.5%-ile stays away from bone/implant ranges. This is a coherent foreground class for default Dice+CE β and the training curves confirm it. | |
| ## Intended use | |
| - **Pretrained starting point** for finetuning on related lung-lesion segmentation tasks (especially mixed-modality or institutional-shift settings) | |
| - **Reference for unified multi-source pretraining** with default nnU-Net v2 settings | |
| - **Input to ensembling** with other folds (when 5-fold runs are available) | |
| ## How NOT to use it | |
| - β Not validated for clinical diagnosis or treatment decisions | |
| - β Single fold, not an ensemble β paper-grade results require all 5 folds | |
| - β Distribution-shift expectations: predominantly LDCT (91%); transfer to a pure diagnostic-CT target may be helped further by finetuning, or by using `Lab-Rasool/CLN-Segmenter-MSD-fold0` as the starting point instead | |
| ## How to use | |
| ### 1. Download the checkpoint and metadata | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| local_dir = snapshot_download(repo_id="Lab-Rasool/CLN-Segmenter-Dataset500-fold0") | |
| print("Files at:", local_dir) | |
| ``` | |
| ### 2. Set up an nnU-Net inference directory | |
| ``` | |
| nnUNet_results/ | |
| βββ Dataset500_LungLesions/ | |
| βββ nnUNetTrainer__nnUNetPlans__3d_fullres/ | |
| βββ dataset.json | |
| βββ plans.json (rename from nnUNetPlans.json) | |
| βββ dataset_fingerprint.json | |
| βββ fold_0/ | |
| βββ checkpoint_best.pth | |
| βββ splits_final.json | |
| ``` | |
| ```bash | |
| DST=/path/to/nnUNet_results/Dataset500_LungLesions/nnUNetTrainer__nnUNetPlans__3d_fullres | |
| mkdir -p $DST/fold_0 | |
| cp $local_dir/dataset.json $DST/dataset.json | |
| cp $local_dir/nnUNetPlans.json $DST/plans.json | |
| cp $local_dir/dataset_fingerprint.json $DST/dataset_fingerprint.json | |
| cp $local_dir/checkpoint_best.pth $DST/fold_0/checkpoint_best.pth | |
| cp $local_dir/splits_final.json $DST/fold_0/splits_final.json | |
| ``` | |
| ### 3. Run inference with nnU-Net | |
| ```bash | |
| export nnUNet_results=/path/to/nnUNet_results | |
| nnUNetv2_predict \ | |
| -i /path/to/your/input_images \ | |
| -o /path/to/output_predictions \ | |
| -d 500 \ | |
| -c 3d_fullres \ | |
| -tr nnUNetTrainer \ | |
| -p nnUNetPlans \ | |
| -f 0 \ | |
| -chk checkpoint_best.pth | |
| ``` | |
| Input images should be CT volumes named with the nnU-Net channel suffix: `<case_id>_0000.nii.gz`. | |
| ## Training procedure | |
| - **Framework**: nnU-Net v2.7.0 (default trainer) | |
| - **Preprocessing**: CT-specific normalization (HU clipping at the 0.5/99.5 percentiles of foreground voxels, then per-case z-score), resampling to target spacing `[1.245, 0.664, 0.664]` mm | |
| - **Augmentation**: nnU-Net's default 3D augmentation pipeline (rotation, scaling, gamma, mirroring, gaussian noise/blur, low-resolution simulation) | |
| - **Optimization**: SGD + Nesterov momentum (Ξ²=0.99), polynomial LR decay (initial LR 0.01 β 0) | |
| - **Iterations**: fixed 250 per epoch (nnU-Net default; independent of dataset size) | |
| - **Best-checkpoint mechanism**: nnU-Net automatically tracks EMA of validation Pseudo Dice and saves `checkpoint_best.pth` at the peak | |
| ## Domain composition note | |
| The training corpus is **9% diagnostic CT (MSD) and 91% LDCT (NLSTseg)**. nnU-Net does not explicitly rebalance per-source sampling β the model sees patches in proportion to case count. With ~500K total patches over 1000 epochs Γ 250 iterations Γ batch 2, that translates to ~45,000 MSD patches and ~455,000 NLSTseg patches. | |
| Empirically the model handles both modalities (`case_0001` MSD scores Dice 0.89 in fold-0 validation), but the underlying representation skews LDCT. Stage 1 v2 will rebalance by adding more diagnostic-CT data (LIDC-IDRI consensus, NSCLC-Radiomics) rather than re-weighting existing samples. | |
| ## Limitations | |
| - **Single fold of 5-fold CV** β not an ensemble. Paper-grade results require all 5 folds either averaged or ensembled at inference. | |
| - **Domain imbalance** β 91% LDCT may underperform without finetuning on a pure diagnostic-CT target (consider `Lab-Rasool/CLN-Segmenter-MSD-fold0` for that case). | |
| - **Small-lesion performance** β per-case Dice for tiny nodules (<5mm) is noticeably worse than for larger tumors; the 0.6172 mean reflects the full distribution including these hard cases. | |
| - **One source case excluded** (`nlst_0393` / patient 205714) due to source-data shape mismatch. | |
| - **No clinical validation** β this is a research artifact, not a medical device. | |
| ## License | |
| **CC-BY-SA 4.0**, inherited from the share-alike clause of the MSD Task06 source dataset license. | |
| ## Citation | |
| If you use this model, please cite all three works: | |
| ```bibtex | |
| @article{isensee2021nnunet, | |
| title = {nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation}, | |
| author = {Isensee, Fabian and Jaeger, Paul F and Kohl, Simon A A and Petersen, Jens and Maier-Hein, Klaus H}, | |
| journal = {Nature Methods}, | |
| volume = {18}, | |
| number = {2}, | |
| pages = {203--211}, | |
| year = {2021} | |
| } | |
| @article{antonelli2022medical, | |
| title = {The Medical Segmentation Decathlon}, | |
| author = {Antonelli, Michela and Reinke, Annika and Bakas, Spyridon and others}, | |
| journal = {Nature Communications}, | |
| volume = {13}, | |
| number = {1}, | |
| pages = {4128}, | |
| year = {2022} | |
| } | |
| @article{chen2025nlstseg, | |
| title = {NLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT Images}, | |
| author = {Chen, et al.}, | |
| journal = {Scientific Data}, | |
| year = {2025}, | |
| doi = {10.1038/s41597-025-05742-x} | |
| } | |
| ``` | |
| ## Project context | |
| Part of **CLN-Segmenter** at the Rasool Lab, Moffitt Cancer Center: a two-stage approach for lung lesion segmentation that pretrains on public datasets (this is the v1 unified pretrain) and finetunes on internal data with domain-specific loss formulations. | |
| - **Code**: https://github.com/lab-rasool/CLN-Segmenter | |
| - **Lab**: https://huggingface.co/Lab-Rasool | |
| Other models in this series: | |
| - `Lab-Rasool/CLN-Segmenter-MSD-fold0` β MSD-only POC (diagnostic CT, 63 cases, Pseudo Dice 0.82) | |
| - `Lab-Rasool/CLN-Segmenter-NLSTseg-fold0` β NLSTseg-only POC (LDCT, 604 cases, Pseudo Dice 0.77) | |