|
|
--- |
|
|
license: cc-by-nc-nd-3.0 |
|
|
base_model: |
|
|
- MONAI/maisi_ct_generative |
|
|
pipeline_tag: image-to-image |
|
|
tags: |
|
|
- medical-imaging |
|
|
- 3d-synthesis |
|
|
- diffusion-models |
|
|
- controlnet |
|
|
- monai |
|
|
- pytorch |
|
|
- rectified-flow |
|
|
- ct-scans |
|
|
- mri-imaging |
|
|
- healthcare-ai |
|
|
datasets: |
|
|
- medical-images |
|
|
- ct-scans |
|
|
- anatomical-masks |
|
|
--- |
|
|
# NoMAISI: Nodule-Oriented Medical AI for Synthetic Imaging |
|
|
|
|
|
<div align="center"> |
|
|
<p align="center"> |
|
|
<img src="NoMAISI_logo.png" alt="PiNS Logo" width="500"> |
|
|
</p> |
|
|
|
|
|
**Nodule-Oriented Medical AI for Synthetic Imaging and Augmentation in Chest CT** |
|
|
|
|
|
[](https://creativecommons.org/licenses/by-nc/4.0/) |
|
|
[](https://hub.docker.com/r/ft42/pins) |
|
|
[](https://python.org) |
|
|
[](https://simpleitk.org) |
|
|
[](https://pytorch.org) |
|
|
[](https://monai.io) |
|
|
[](https://github.com/fitushar/PiNS) |
|
|
[](https://github.com/fitushar/CaNA) |
|
|
|
|
|
</div> |
|
|
|
|
|
|
|
|
# Abstract |
|
|
Medical imaging datasets are increasingly available, yet abnormal and annotation-intensive cases such as lung nodules remain underrepresented. We Introduced NoMAISI (Nodule-Oriented Medical AI for Synthetic Imaging), a generative framework built on foundational backbones with flow-based diffusion and ControlNet conditioning. Using NoMAISI, we curated a large multi-cohort lung nodule dataset and applied context-aware nodule volume augmentation, including relocation, shrinkage to simulate early-stage disease, and expansion to model progression. Each case was rendered into multiple synthetic variants, producing a diverse and anatomically consistent dataset. Fidelity was evaluated with cross-cohort similarity metrics, and downstream integration into lung nodule detection, and classification tasks demonstrated improved external test performance, particularly in underrepresented lesion categories. These results show that nodule-oriented synthetic imaging and curated augmentation can complement clinical data, reduce annotation demands, and expand the availability of training resources for healthcare AI. |
|
|
|
|
|
## π§© Workflow Overview |
|
|
|
|
|
The overall pipeline for organ, body, and nodule segmentation with alignment is shown below: |
|
|
|
|
|
<p align="center"> |
|
|
<img src="doc/images/workflow.png" alt="Segmentation Pipeline"/> |
|
|
</p> |
|
|
|
|
|
**Workflow** for constructing the **NoMAISI** development dataset. The pipeline includes **(1)** organ segmentation using AI models, **(2)** body segmentation with algorithmic methods, **(3)** nodule segmentation through AI-assisted and ML-based refinement, and **(4)** segmentation alignment to integrate organs, body, and nodules segmentations into anatomically consistent volumes. |
|
|
|
|
|
<p align="center"> |
|
|
<img src="doc/images/NoMAISI_train_and_infer.png" alt="NoMAISI_train_and_infer"/> |
|
|
</p> |
|
|
|
|
|
**Overview** of our flow-based latent diffusion model with ControlNet conditioning for AI-based CT generation. The pipeline consists of three stages: **(top) Pretrained VAE** for image compression, where CT images are encoded into latent features using a frozen VAE; **(middle)** Model fine-tuning, where a **Rectified Flow ODE sampler**, conditioned on segmentation masks and voxel spacing through a **fine-tuned ControlNet**, predicts velocity fields in latent space and is optimized with a region-specific contrastive loss emphasizing ROI sensitivity and background consistency; and **(bottom) Inference**, where segmentation masks and voxel spacing guide latent sampling along the ODE trajectory to obtain a clean latent representation, which is then decoded by the VAE into full-resolution AI-generated CT images conditioned by body and lesion masks. |
|
|
|
|
|
|
|
|
## π Dataset Composition |
|
|
|
|
|
The table below summarizes the datasets included in this project, with their split sizes (Patients, CT scans, and Nodules) and the annotation types available. |
|
|
|
|
|
| Dataset | Patients <br>n (%) | CT Scans <br>n (%) | Nodules <br>n (%) | Organ Seg | Nodule Seg | Nodule CCC | Nodule Box | |
|
|
|------------------|---------------------|---------------------|-------------------|-----------|------------|------------|------------| |
|
|
| **LNDbv4** | 223 (3.17) | 223 (2.52) | 1132 (7.84) | β | β | β | β | |
|
|
| **NSCLC-R** | 415 (5.89) | 415 (4.69) | 415 (2.87) | β | β | β | β | |
|
|
| **LIDC-IDRI** | 870 (12.35) | 870 (9.84) | 2584 (17.89) | β | β | β | β | |
|
|
| **DLCS-24** | 1605 (22.79) | 1605 (18.15) | 2478 (17.16) | β | β | β | β | |
|
|
| **Intgmultiomics** | 1936 (27.49) | 1936 (21.90) | 1936 (13.40) | β | β | β | β | |
|
|
| **LUNA-25** | 1993 (28.30) | 3792 (42.89) | 5899 (40.84) | β | β | β | β | |
|
|
| **TOTAL** | 7042 (100) | 8841 (100) | 14444 (100) | β | β | β | β | |
|
|
|
|
|
--- |
|
|
**Notes** |
|
|
- Percentages indicate proportion relative to the total for each column. |
|
|
- βοΈ = annotation available, β = annotation not available. |
|
|
- βNodule CCCβ = nodule center coordinates. |
|
|
- βNodule Boxβ = bounding-box annotations. |
|
|
|
|
|
### π Dataset citations References |
|
|
* LNDbv4 : [https://zenodo.org/records/8348419](https://zenodo.org/records/8348419) |
|
|
* NSCLC-Radiomics : [https://www.cancerimagingarchive.net/collection/nsclc-radiogenomics/](https://www.cancerimagingarchive.net/collection/nsclc-radiogenomics/) |
|
|
* LIDC-IDRI: [https://ieee-dataport.org/documents/lung-image-database-consortium-image-collection-lidc-idri](https://ieee-dataport.org/documents/lung-image-database-consortium-image-collection-lidc-idri) |
|
|
* DLCS24: [https://zenodo.org/records/13799069](https://zenodo.org/records/13799069) |
|
|
* Intgmultiomics: [M Zhao et. al, Nat.Commun(2025).](https://www.nature.com/articles/s41467-024-55594-z#citeas) |
|
|
* LUNA25: [https://luna25.grand-challenge.org/](https://luna25.grand-challenge.org/) |
|
|
|
|
|
# AI-Generated CT Evaluations |
|
|
|
|
|
### π FrΓ©chet Inception Distance (FID) Results |
|
|
|
|
|
FrΓ©chet Inception Distance (FID) of the **MAISI-v2** baseline and **NoMAISI** models with multiple public clinical datasets (test dataset) as the references (Lower is better). |
|
|
|
|
|
| **FID (Avg.)** | **LNDbv4** | **NSCLC-R** | **LIDC-IDRI** | **DLCS-24** | **Intgmultiomics** | **LUNA-25** | |
|
|
|-------------------|------------|-------------|---------------|-------------|--------------------|-------------| |
|
|
| **Real** LNDbv4 | β | 5.13 | 1.49 | 1.05 | 2.40 | 1.98 | |
|
|
| **Real** NSCLC-R | 5.13 | β | 3.12 | 3.66 | 1.56 | 2.65 | |
|
|
| **Real** LIDC-IDRI | 1.49 | 3.12 | β | 0.79 | 1.44 | 0.75 | |
|
|
| **Real** DLCS-24 | 1.05 | 3.66 | 0.79 | β | 1.56 | 1.00 | |
|
|
| **Real** Intgmultiomics| 2.40 | 1.56 | 1.44 | 1.56 | β | 1.57 | |
|
|
| **Real** LUNA-25 | 1.98 | 2.65 | 0.75 | 1.00 | 1.57 | β | |
|
|
| **AI-Generated** MAISI-V2 | 3.15 | 5.21 | 2.70 | 2.32 | 2.82 | 1.69 | |
|
|
| **AI-Generated** NoMAISI (ours) | 2.99 | 3.05 | 2.31 | 2.27 | 2.62 | 1.18 | |
|
|
|
|
|
|
|
|
### π FID Parity Plot |
|
|
|
|
|
<p align="left"> |
|
|
<img src="doc/images/GanAI_fid_scatter_marker_legend.png" alt="Parity comparison of FID for realβreal vs AI-generated CT across datasets" width="500"> |
|
|
</p> |
|
|
|
|
|
**Comparison of FrΓ©chet Inception Distance (FID) between realβreal and AI-generated CT datasets.** Each point represents a clinical dataset (**LNDbv4, NSCLC-R, LIDC-IDRI, DLCS24, Intgmultiomics, LUNA25**) under different generative models (**MAISI-V2, NoMAISI**).The x-axis shows the **median FID** computed between real datasets, while the y-axis shows the **FID of AI-generated data** compared to real. |
|
|
The dashed diagonal line denotes **parity (y = x)**, where AI-generated fidelity would match realβreal fidelity. |
|
|
|
|
|
### πΌοΈ Example Results |
|
|
**Comparison of CT generation from anatomical masks.** |
|
|
- **Left:** Input organ/body segmentation mask. |
|
|
- **Middle:** Generated CT slice using **MAISI-V2**. |
|
|
- **Right:** Generated CT slice using **NoMAISI (ours)**. |
|
|
- **Yellow boxes** highlight lung nodule regions for comparison. |
|
|
|
|
|
<p align="center"> |
|
|
<img src="doc/images/DLCS_1419_ann0_slice134_triple.png" alt="Comparison of MAISI-V2 vs NoMAISI on lung CT with input masks" width="1000"> |
|
|
</p> |
|
|
<p align="center"> |
|
|
<img src="doc/images/DLCS_1508_ann0_slice46_triple.png" alt="Comparison of MAISI-V2 vs NoMAISI on lung CT with input masks" width="1000"> |
|
|
</p> |
|
|
<p align="center"> |
|
|
<img src="doc/images/DLCS_1453_ann0_slice204_triple.png" alt="Comparison of MAISI-V2 vs NoMAISI on lung CT with input masks" width="1000"> |
|
|
</p> |
|
|
|
|
|
|
|
|
# Inference Guide |
|
|
1. [Project Structure](###project-structure) |
|
|
2. [Configuration Files](###configuration-files) |
|
|
|
|
|
### Model Weights |
|
|
Model weights are available upon request. Please email the authors: <tushar.ece@duke.edu>. |
|
|
|
|
|
|
|
|
### π Project Structure |
|
|
|
|
|
``` |
|
|
NoMAISI/ |
|
|
βββ configs/ # Configuration files |
|
|
β βββ config_maisi3d-rflow.json # Main model configuration |
|
|
β βββ infr_env_NoMAISI_DLCSD24_demo.json # Environment settings |
|
|
β βββ infr_config_NoMAISI_controlnet.json # ControlNet inference config |
|
|
βββ scripts/ # Python inference scripts |
|
|
β βββ infer_testV2_controlnet.py # Main inference script |
|
|
β βββ infer_controlnet.py # ControlNet inference |
|
|
β βββ utils.py # Utility functions |
|
|
βββ models/ # Pre-trained model weights |
|
|
βββ data/ # Input data directory |
|
|
βββ outputs/ # Generated results |
|
|
βββ logs/ # Execution logs |
|
|
βββ inference.sub # SLURM job script |
|
|
``` |
|
|
|
|
|
## βοΈ Configuration Files |
|
|
|
|
|
#### 1. Main Model Configuration (`config_maisi3d-rflow.json`): Controls the core diffusion model parameters: |
|
|
- Model architecture settings; Sampling parameters; Image dimensions and spacing |
|
|
|
|
|
#### 2. Environment Configuration (`infr_env_NoMAISI_DLCSD24_demo.json`): Defines runtime environment |
|
|
- Data paths and directories; GPU settings; Memory allocation |
|
|
|
|
|
#### 3. ControlNet Configuration (`infr_config_NoMAISI_controlnet.json`): ControlNet-specific settings |
|
|
- Conditioning parameters; Generation controls; Output specifications |
|
|
|
|
|
## π Running Inference |
|
|
|
|
|
```bash |
|
|
cd /path/NoMAISI/ |
|
|
# Create logs directory if it doesn't exist |
|
|
mkdir -p logs |
|
|
# Submit job to SLURM |
|
|
sbatch inference.sub |
|
|
``` |
|
|
|
|
|
```bash |
|
|
# Run inference directly |
|
|
cd /path/NoMAISI/ |
|
|
python -m scripts.infer_testV2_controlnet \ |
|
|
-c ./configs/config_maisi3d-rflow.json \ |
|
|
-e ./configs/infr_env_NoMAISI_DLCSD24_demo.json \ |
|
|
-t ./configs/infr_config_NoMAISI_controlnet.json |
|
|
``` |
|
|
|
|
|
# Downstream Task: |
|
|
|
|
|
* **Cancer vs. No-Cancer Classification** |
|
|
* **Nodule Detection** . |
|
|
* **Nodule Segmentation** . |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## π¬ Downstream Task: Cancer vs. No-Cancer Classification |
|
|
|
|
|
 |
|
|
|
|
|
**Shown.** AUC vs. the **% of clinical data retained** (x-axis: **100%**, **50%**, **20%**, **10%**). |
|
|
**Curves (additive augmentation β we **add** AI-generated nodules; we never replace clinical samples):** |
|
|
- **Clinical (LUNA25)** β baseline using only the retained clinical data. |
|
|
- **Clinical + AI-gen. (n%)** β at each point, add AI-generated data equal to the **same percentage as the retained clinical fraction**. |
|
|
*Examples:* at **50% clinical β +50% AI-gen**; **20% β +20%**; **10% β +10%**. |
|
|
- **Clinical + AI-gen. (100%)** β at each point, add AI-generated data equal to **100% of the full clinical dataset size**, regardless of the retained fraction. |
|
|
*Example:* at **10% clinical β +100% AI-gen**. |
|
|
|
|
|
**Takeaways** |
|
|
- **AI-generated nodules improve data-efficiency:** at **low clinical fractions (50%β10%)**, *Clinical + AI-gen. (n%)* typically **matches or exceeds** clinical-only AUC. |
|
|
- **Bigger synthetic boosts (100%)** can help in some regimes but may underperform the matched *n%* mix depending on cohort β **ratio-balanced augmentation** is often safer. |
|
|
- Trends **generalize to external cohorts**, indicating **usability** beyond the development data. |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# Acknowledgements |
|
|
|
|
|
We gratefully acknowledge the open-source projects that directly informed this repository: the [MAISI tutorial](https://github.com/Project-MONAI/tutorials/tree/main/generation/maisi) from the Project MONAI tutorials, the broader [Project MONAI ecosystem](https://github.com/Project-MONAI), |
|
|
our related benchmark repo [AI in Lung Health β Benchmarking](https://github.com/fitushar/AI-in-Lung-Health-Benchmarking-Detection-and-Diagnostic-Models-Across-Multiple-CT-Scan-Datasets), |
|
|
and our companion toolkits [PiNS β Point-driven Nodule Segmentation](https://github.com/fitushar/PiNS) |
|
|
and [CaNA β Context-Aware Nodule Augmentation](https://github.com/fitushar/CaNA). We thank these communities and contributors for their exceptional open-source efforts. If you use our models or code, please also consider citing these works (alongside this repository) to acknowledge their contributions. |
|
|
|
|
|
# References |
|
|
|
|
|
* [1] [MAISI-V2; Guo, Pengfei, et al.(2025)](https://arxiv.org/abs/2508.05772) |
|
|
* [2] [AI in Lung Health- Benchmarking; Tushar et al.(2024)](https://arxiv.org/abs/2405.04605) |
|
|
* [3] [SYN-LUNGS; Tushar et al.(2025)](https://arxiv.org/abs/2502.21187) |