Add Mean Validation Dice 0.6123 + validation_summary.json (per-case results from checkpoint_best)
3f804c8 verified | license: cc-by-4.0 | |
| tags: | |
| - nnunet | |
| - nnunetv2 | |
| - medical-imaging | |
| - segmentation | |
| - 3d-segmentation | |
| - ct | |
| - ldct | |
| - low-dose-ct | |
| - lung | |
| - lung-cancer | |
| - tumor-segmentation | |
| - multi-institutional | |
| library_name: nnunetv2 | |
| pipeline_tag: image-segmentation | |
| datasets: | |
| - NLSTseg | |
| language: | |
| - en | |
| # CLN-Segmenter β NLSTseg Lung Lesion Segmentation (fold 0) | |
| A 3D U-Net (nnU-Net v2 `3d_fullres`) trained on the **NLSTseg** dataset β pixel-level lung lesion annotations on low-dose screening CT (LDCT) from the National Lung Screening Trial. Fold 0 of 5-fold cross-validation. Released as part of the CLN-Segmenter project at the Rasool Lab, Moffitt Cancer Center. | |
| This is a single-fold pretrain checkpoint, intended as a starting point for downstream lung-lesion segmentation work β not a clinical-grade tool. | |
| ## Quick stats | |
| | | | | |
| |--|--| | |
| | **Architecture** | nnU-Net v2 `3d_fullres` (PlainConvUNet, 6 stages, features `[32, 64, 128, 256, 320, 320]`) | | |
| | **Training data** | NLSTseg β 604 cases (1 excluded; 483 train / 121 val for fold 0) | | |
| | **Modality** | Low-dose screening CT (LDCT), multi-institutional | | |
| | **Loss** | Dice + Cross-Entropy (nnU-Net default), `batch_dice=True` | | |
| | **Schedule** | 1000 epochs, polynomial LR decay 0.01 β 0, batch size 2, patch `[80, 192, 160]` | | |
| | **Hardware** | 1Γ NVIDIA H100 80GB, ~7h wall-time | | |
| | **Mean Validation Dice** (per-case, sliding-window) | **0.6123** | | |
| | **Best EMA Pseudo Dice** (in-training proxy) | 0.7663 (epoch ~870) | | |
| | **Generalization** | No measurable overfitting β train/val loss curves overlap throughout | | |
| ## Files in this repo | |
| | File | Role | | |
| |------|------| | |
| | `checkpoint_best.pth` | Model weights β saved at the EMA Pseudo Dice peak (~epoch 870) | | |
| | `nnUNetPlans.json` | Architecture spec + preprocessing plans. **Required** for inference. | | |
| | `dataset.json` | Channel names, label names, file ending (nnU-Net v2 schema). **Required** for inference. | | |
| | `dataset_fingerprint.json` | HU intensity stats from training data | | |
| | `splits_final.json` | Train/val case ID splits for fold 0 (reproducibility) | | |
| | `progress.png` | Training curves: loss, Pseudo Dice, epoch duration, learning rate | | |
| ## Training data and provenance | |
| This model was trained **only on the publicly available NLSTseg dataset** (Chen et al. 2025, *Scientific Data*, CC-BY 4.0): pixel-level lung lesion annotations on top of NLST low-dose screening CT imagery. It contains 715 expert-annotated lesions across 605 patients (1 patient excluded β `nlst_0393` / patient 205714 β due to a CT/mask shape mismatch in the source files; see project changelog). | |
| NLSTseg has key characteristics that make it complementary to diagnostic-CT datasets: | |
| - **Multi-institutional**: 33 contributing institutions, 4 scanner brands (GE, Siemens, Philips, Toshiba) | |
| - **Screening-cohort lesions**: smaller than typical diagnostic-CT tumors (median lesion volume **1.37 cmΒ³**) β most caught at Stage IA | |
| - **Multi-label source**: per-lesion integer labels (1β7) in the original masks; binarized to `{0, 1}` for this single-class training. The tumor-vs-nodule distinction (`labels_type` 1 vs 2 in the original `Label.xlsx`) is recoverable from the source if a future multi-class run is desired. | |
| - **LDCT noise**: lower radiation dose than diagnostic CT; noisier images, often thicker slices | |
| **No patient-identifiable or institutional data was used.** This checkpoint contains no information derived from any non-public source. | |
| ## Intended use | |
| - **Pretrained starting point** for finetuning on related lung-lesion segmentation tasks, especially LDCT or screening-cohort data | |
| - **Reference baseline** for nnU-Net default performance on NLSTseg's small-lesion, multi-institutional regime | |
| - **Input to ensembling** with other folds (when 5-fold runs are available) | |
| ## How NOT to use it | |
| - β Not validated for clinical diagnosis or treatment decisions | |
| - β Not validated on diagnostic-CT cases (different intensity distributions, larger lesions) β see Limitations | |
| - β Single fold, not an ensemble β paper-grade results require all 5 folds | |
| - β Multi-lesion identity is collapsed in training labels; if your downstream task needs per-lesion instances, this checkpoint won't recover them directly | |
| ## How to use | |
| ### 1. Download the checkpoint and metadata | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| local_dir = snapshot_download(repo_id="Lab-Rasool/CLN-Segmenter-NLSTseg-fold0") | |
| print("Files at:", local_dir) | |
| ``` | |
| ### 2. Set up an nnU-Net inference directory | |
| nnU-Net expects a specific directory structure for results: | |
| ``` | |
| nnUNet_results/ | |
| βββ Dataset503_NLSTseg/ | |
| βββ nnUNetTrainer__nnUNetPlans__3d_fullres/ | |
| βββ dataset.json | |
| βββ plans.json (rename from nnUNetPlans.json) | |
| βββ dataset_fingerprint.json | |
| βββ fold_0/ | |
| βββ checkpoint_best.pth | |
| βββ splits_final.json | |
| ``` | |
| You can build this with: | |
| ```bash | |
| DST=/path/to/nnUNet_results/Dataset503_NLSTseg/nnUNetTrainer__nnUNetPlans__3d_fullres | |
| mkdir -p $DST/fold_0 | |
| cp $local_dir/dataset.json $DST/dataset.json | |
| cp $local_dir/nnUNetPlans.json $DST/plans.json | |
| cp $local_dir/dataset_fingerprint.json $DST/dataset_fingerprint.json | |
| cp $local_dir/checkpoint_best.pth $DST/fold_0/checkpoint_best.pth | |
| cp $local_dir/splits_final.json $DST/fold_0/splits_final.json | |
| ``` | |
| ### 3. Run inference with nnU-Net | |
| ```bash | |
| export nnUNet_results=/path/to/nnUNet_results | |
| nnUNetv2_predict \ | |
| -i /path/to/your/input_images \ | |
| -o /path/to/output_predictions \ | |
| -d 503 \ | |
| -c 3d_fullres \ | |
| -tr nnUNetTrainer \ | |
| -p nnUNetPlans \ | |
| -f 0 \ | |
| -chk checkpoint_best.pth | |
| ``` | |
| Input images should be CT volumes named with the nnU-Net channel suffix: `<case_id>_0000.nii.gz`. | |
| ## Training procedure | |
| - **Framework**: nnU-Net v2.7.0 (default trainer) | |
| - **Preprocessing**: CT-specific normalization (HU clipping at the 0.5/99.5 percentiles of foreground voxels, then per-case z-score), resampling to target spacing `[1.25, 0.664, 0.664]` mm | |
| - **Augmentation**: nnU-Net's default 3D augmentation pipeline (rotation, scaling, gamma, mirroring, gaussian noise/blur, low-resolution simulation) | |
| - **Optimization**: SGD + Nesterov momentum (Ξ²=0.99), polynomial LR decay (initial LR 0.01) | |
| - **Iterations**: fixed 250 per epoch (nnU-Net default; independent of dataset size) | |
| - **Best-checkpoint mechanism**: nnU-Net automatically tracks EMA of validation Pseudo Dice and saves `checkpoint_best.pth` at the peak | |
| ## Evaluation | |
| Two complementary Dice metrics, both honest, computed on the 121 fold-0 validation cases: | |
| | Metric | Value | What it measures | | |
| |--------|-------|------------------| | |
| | **Mean Validation Dice** (per-case, sliding-window) | **0.6123** | Per-case Dice from full-volume `nnUNetv2_predict` inference, averaged across 121 val cases. **Case-weighted** β every scan counts equally regardless of tumor size. *This is the metric most papers report.* | | |
| | **Best EMA Pseudo Dice** (in-training) | 0.7663 | Voxel-pooled Dice across validation patches during training. **Voxel-weighted** β large lesions dominate. Used by nnU-Net to select `checkpoint_best.pth`. | | |
| | Pseudo Dice raw (jagged) range | 0.45β0.85 | (peak per-epoch readings during training) | | |
| | Train/val loss gap (final epoch) | ~0 | No measurable overfitting throughout. | | |
| The **0.15 gap** between Pseudo Dice (0.7663) and Mean Validation Dice (0.6123) is wider than the gap on uniform-tumor datasets like MSD Task06 (~0.10 gap there). NLSTseg has lesion volumes spanning 0.03 β 372 cmΒ³ (median 1.37 cmΒ³, long-tailed), so voxel-pooled Dice is dominated by the few large lesions while per-case Dice gives equal weight to many small-lesion cases that are individually harder. The voxel-pool vs case-average disagreement reflects this distribution honestly. | |
| The training plot (`progress.png`) shows: | |
| 1. **Smooth Pseudo Dice climb** from 0 β 0.55 in the first ~50 epochs, then 0.55 β 0.77 over epochs 50β870. Slow continuous improvement throughout, with diminishing returns past epoch ~600. | |
| 2. **Train/val loss curves overlap nearly perfectly** end-to-end. With 483 training cases (10Γ MSD-only's 50), the model has enough data variety that it cannot memorize specifics. This translates into clean generalization β no overfitting to manage. | |
| For comparisons against other methods, **cite the Mean Validation Dice (0.6123)**. Pseudo Dice is useful as an in-training monitoring signal but not for cross-method comparison. | |
| Per-case validation results are available in `validation_summary.json` (Dice, IoU, TP/FP/FN counts per case). | |
| The 0.6123 figure reflects the difficulty of small-lesion segmentation in heterogeneous, multi-institutional LDCT. It is the model's honest performance on its native validation distribution. | |
| ## Why this checkpoint matters | |
| This is the **clean-generalization complement** to the MSD-only fold-0 checkpoint (`Lab-Rasool/CLN-Segmenter-MSD-fold0`). MSD shows what nnU-Net default does on a small (50 train / 13 val) single-institution diagnostic-CT corpus with large tumors β high Pseudo Dice (0.82) but with mild late-stage overfitting. NLSTseg shows the opposite end: ~10Γ more data (483 train / 121 val), multi-institutional LDCT, smaller lesions β lower raw Dice (0.77) but no overfitting. | |
| For Stage 2 finetuning on a target domain, this checkpoint is the right choice when the target is screening / LDCT / multi-institutional / small-lesion. For diagnostic-CT-heavy targets, the MSD checkpoint or the unified `Dataset500_LungLesions` pretrain (when available) is the better starting point. | |
| ## Limitations | |
| - **Single fold of 5-fold CV** β not an ensemble. Published-grade numbers require all 5 folds either averaged or ensembled at inference. | |
| - **Trained on LDCT only** β performance on diagnostic CT is unknown and likely lower without finetuning (different HU distributions, less noise). | |
| - **Small lesions dominate the training distribution** β performance on large primary tumors (e.g., >5 cmΒ³) is not optimized for. | |
| - **Multi-label β binary collapse**: per-lesion identity and tumor-vs-nodule distinction are lost in this checkpoint's outputs. | |
| - **One source case excluded** (`nlst_0393` / patient 205714) due to source-data shape mismatch. Not a model issue, but worth knowing if you reproduce. | |
| - **No clinical validation** β this is a research artifact, not a medical device. | |
| ## License | |
| **CC-BY 4.0**, inherited from the NLSTseg source dataset license. | |
| ## Citation | |
| If you use this model, please cite: | |
| ```bibtex | |
| @article{isensee2021nnunet, | |
| title = {nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation}, | |
| author = {Isensee, Fabian and Jaeger, Paul F and Kohl, Simon A A and Petersen, Jens and Maier-Hein, Klaus H}, | |
| journal = {Nature Methods}, | |
| volume = {18}, | |
| number = {2}, | |
| pages = {203--211}, | |
| year = {2021} | |
| } | |
| @article{chen2025nlstseg, | |
| title = {NLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT Images}, | |
| author = {Chen, et al.}, | |
| journal = {Scientific Data}, | |
| year = {2025}, | |
| doi = {10.1038/s41597-025-05742-x} | |
| } | |
| @article{nlst2011, | |
| title = {Reduced lung-cancer mortality with low-dose computed tomographic screening}, | |
| author = {{The National Lung Screening Trial Research Team}}, | |
| journal = {New England Journal of Medicine}, | |
| year = {2011}, | |
| doi = {10.1056/NEJMoa1102873} | |
| } | |
| ``` | |
| ## Project context | |
| Part of **CLN-Segmenter** at the Rasool Lab, Moffitt Cancer Center: a two-stage approach for lung lesion segmentation that pretrains on public datasets (this is one component) and finetunes on internal data with domain-specific loss formulations. | |
| - **Code**: https://github.com/lab-rasool/CLN-Segmenter | |
| - **Lab**: https://huggingface.co/Lab-Rasool | |
| Other models in this series: | |
| - `Lab-Rasool/CLN-Segmenter-MSD-fold0` β single-dataset MSD Task06 POC (diagnostic CT, 63 expert cases, Dice 0.82) | |
| - `Lab-Rasool/CLN-Segmenter-Dataset500-fold0` β unified MSD + NLSTseg pretrain (planned) | |