Unconditional Image Generation
latent_diffusion
medical-imaging
diffusion
Can-Zhao commited on
Commit
cff477a
·
verified ·
1 Parent(s): 0e5f00f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -65
README.md CHANGED
@@ -13,104 +13,99 @@ pipeline_tag: unconditional-image-generation
13
  # NV-Generate-MR-Brain Overview
14
 
15
  ## Description:
16
- NV-Generate-MR-Brain is a state-of-the-art three-dimensional (3D) latent diffusion model designed to generate high-quality synthetic magnetic resonance (MR) Brain images with including
17
- - whole brain
18
- - skull-stripped brain
19
 
20
- It can generate brain images with contrast of
21
- - T1w
22
- - T2w
23
- - Flair
24
- - SWI
25
 
26
  The model excels at data augmentation and at generating realistic medical imaging data to supplement datasets limited by privacy concerns or the rarity of certain conditions. It can also significantly enhance the performance of other medical imaging AI models by generating diverse, realistic training data.
27
 
28
- ## Github Links:
29
- Training and inference code are in:
30
- [https://github.com/NVIDIA-Medtech/NV-Generate-CTMR/tree/main](https://github.com/NVIDIA-Medtech/NV-Generate-CTMR/tree/main).
31
 
 
 
32
 
33
  ### Deployment Geography:
34
  Global
35
 
36
  ### Use Case:
37
- Medical researchers, AI developers, and healthcare institutions would be expected to use this system for generating synthetic MR Brain training data, data augmentation for rare conditions, and advancing AI applications in healthcare research.
 
 
 
38
 
39
- ## Download
40
- For example, to download the VAE, you can run:
41
- ```
42
- pip install -U huggingface_hub
43
- huggingface-cli download nvidia/NV-Generate-MR-Brain \
44
- models/diff_unet_3d_rflow-mr-brain_v0.pt \
45
- --local-dir ./models
46
- ```
47
 
48
  ### Release Date:
49
- Huggingface: 10/27/2025 via https://huggingface.co/NVIDIA
50
 
51
  ## Reference(s):
52
- [1] Zhao, Can, et al. "Maisi-v2: Accelerated 3d high-resolution medical image synthesis with rectified flow and region-specific contrastive loss." arXiv preprint arXiv:2508.05772 (2025).
53
 
54
- [2] Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf
55
 
56
- [3] Lvmin Zhang, Anyi Rao, Maneesh Agrawala; "Adding Conditional Control to Text-to-Image Diffusion Models." Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 3836-3847. https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_Adding_Conditional_Control_to_Text-to-Image_Diffusion_Models_ICCV_2023_paper.pdf
57
 
58
- ## Model Architecture:
59
- **Architecture Type:** Transformer
60
- **Network Architecture:** 3D UNet + attention blocks
61
 
62
- This model was developed from scratch using MONAI components.
 
 
 
63
  **Number of model parameters:** 240M
64
 
 
 
 
 
65
  ## Input:
66
- **Input Type(s):** Integer, List, Array
67
- **Input Format(s):** Integer values, String arrays, Float arrays
68
- **Input Parameters:** Number of Samples (1D), Output Size (1D), and Spacing (1D)
69
- **Other Properties Related to Input:** Supports controllable synthetic MR generation with whole brain or skull-stripped brain selection, customizable output dimensions, configurable voxel spacing (0.4-5.0mm).
70
 
71
  ### num_output_samples
72
  - **Type:** Integer
73
- - **Description:** Required input indicates the number of synthetic images the model will generate
74
 
75
- ### modality class
76
- - **Type:** Integer
77
- - **Description:** Required input indicates the contrast of generated, defined in https://github.com/NVIDIA-Medtech/NV-Generate-CTMR/blob/main/configs/modality_mapping.json
78
- - "mri":8,
79
- "mri_t1":9,
80
- "mri_t2":10,
81
- "mri_flair":11,
82
- "mri_swi":20,
83
- "mri_t1_skull_stripped":29,
84
- "mri_t2_skull_stripped":30,
85
- "mri_flair_skull_stripped":31,
86
- "mri_swi_skull_stripped":32,
87
- - **Options:** [8, 9, 10, 11, 20, 29, 30, 31, 32]
88
 
89
  ### output_size
90
  - **Type:** Array of 3 Integers
91
- - **Description:** Optional specification of x, y, and z dimensions of MR image
92
- - **Constraints:** Must be 128, 256, 384, or 512 for x- and y-axes; 128, 256 for z-axis
93
 
94
  ### spacing
95
  - **Type:** Array of 3 Floats
96
  - **Description:** Optional voxel spacing specification
97
- - **Range:** 0.4mm to 5.0mm per element
98
 
99
  ## Output:
100
- **Output Type(s):** Image
101
- **Output Format:** Neuroimaging Informatics Technology Initiative (NIfTI), Digital Imaging and Communications in Medicine (DICOM), Nearly Raw Raster Data (Nrrd)
102
- **Output Parameters:** Three-Dimensional (3D)
103
- **Other Properties Related to Output:** Synthetic MR brain images with dimensions up to 512×512×256 voxels and spacing between 0.4mm and 5.0mm.
104
 
105
- Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (GPU cores) and software frameworks (CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
106
 
107
  ## Software Integration:
108
  **Runtime Engine(s):**
109
- * MONAI Core v.1.5.0
110
 
111
  **Supported Hardware Microarchitecture Compatibility:**
112
  * NVIDIA Ampere
113
  * NVIDIA Hopper
 
114
 
115
  **Supported Operating System(s):**
116
  * Linux
@@ -118,19 +113,16 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
118
  The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
119
 
120
  ## Model Version(s):
121
- 0.1 - Initial release version for synthetic MR brain image generation
122
 
123
  ## Training, Testing, and Evaluation Datasets:
124
 
125
  ### Dataset Overview:
126
- **Total Size:** ~38k subjects, including 265k whole brain and 265k skull-stripped brain
127
- **Total Number of Datasets:** 1 datasets
128
-
129
- Public datasets from multiple scanner types were processed to create high-quality 3D MR brain volumes. The data processing pipeline ensured consistent voxel spacing, standardized orientations.
130
 
131
  ## Training Dataset:
132
  **Data Modality:**
133
- * Image
134
 
135
  **Image Training Data Size:**
136
  * Less than a Million Images
@@ -141,14 +133,29 @@ Public datasets from multiple scanner types were processed to create high-qualit
141
  **Labeling Method by dataset:**
142
  * Hybrid: Human, Automatic/Sensors
143
 
 
 
144
  ## Testing Dataset:
 
 
 
 
 
 
145
  **Data Collection Method by dataset:**
146
  * Hybrid: Human, Automatic/Sensors
147
 
148
  **Labeling Method by dataset:**
149
  * Hybrid: Human, Automatic/Sensors
150
 
 
 
151
  ## Evaluation Dataset:
 
 
 
 
 
152
 
153
  **Data Collection Method by dataset:**
154
  * Hybrid: Human, Automatic/Sensors
@@ -156,13 +163,19 @@ Public datasets from multiple scanner types were processed to create high-qualit
156
  **Labeling Method by dataset:**
157
  * Hybrid: Human, Automatic/Sensors
158
 
159
- ## Inference:
160
- **Acceleration Engine:** PyTorch
161
- **Test Hardware:**
 
 
162
  * A100
163
  * H100
164
 
165
  ## Ethical Considerations:
166
- NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
 
 
 
 
167
 
168
  Please report model quality, risk, security vulnerabilities or concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
 
13
  # NV-Generate-MR-Brain Overview
14
 
15
  ## Description:
16
+ NV-Generate-MR-Brain is a three-dimensional (3D) latent diffusion model designed to generate high-quality synthetic brain magnetic resonance imaging (MRI) images, achieving the highest resolution and best FID scores among comparable models. This model is specialized for brain MRI with support for multiple modalities: T1, Fluid Attenuated Inversion Recovery (FLAIR), T2, and Susceptibility Weighted Imaging (SWI).
 
 
17
 
18
+ Compared to the previous NV-Generate-MR release, the key differences are:
19
+ - **Resolution and image size:** Brain images are typically smaller than full-body MR; max dimension will be 512x512x256, resolution will be 0.4x0.4x0.6mm
20
+ - **Supported modalities:** T1, FLAIR, T2, and SWI, selected via integer label input
21
+ - **Cross-modality synthesis:** The primary use case is cross-modality synthesis (e.g., T1 → FLAIR, FLAIR → T1), enabling generation of complementary MRI modalities from existing data
 
22
 
23
  The model excels at data augmentation and at generating realistic medical imaging data to supplement datasets limited by privacy concerns or the rarity of certain conditions. It can also significantly enhance the performance of other medical imaging AI models by generating diverse, realistic training data.
24
 
25
+ This model is ready for commercial use.
 
 
26
 
27
+ ### License/Terms of Use:
28
+ Use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). Additional Information: [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
29
 
30
  ### Deployment Geography:
31
  Global
32
 
33
  ### Use Case:
34
+ Medical researchers, AI developers, and healthcare institutions would be expected to use this model for:
35
+ - **Cross-modality synthesis:** Generating complementary brain MRI modalities (e.g., synthesizing FLAIR from T1 or T1 from FLAIR)
36
+ - **Synthetic training data generation:** Producing synthetic brain MRI images for data augmentation and AI model training
37
+ - **Data augmentation for rare conditions:** Supplementing limited datasets where privacy or rarity restricts data availability
38
 
39
+ It is not a clinically validated medical device and should not be used for clinical diagnostic purposes.
 
 
 
 
 
 
 
40
 
41
  ### Release Date:
42
+ Huggingface: 03/16/2026 (GTC San Jose 2026) via https://huggingface.co/nvidia/NV-Generate-MR-brain
43
 
44
  ## Reference(s):
 
45
 
46
+ [1] MAISI V2: https://arxiv.org/abs/2508.05772
47
 
48
+ [2] Guo, Pengfei, et al. "MAISI: Medical AI for Synthetic Imaging." arXiv preprint arXiv:2409.11169. 2024. https://arxiv.org/abs/2409.11169
49
 
50
+ [3] Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf
51
+
52
+ [4] Lvmin Zhang, Anyi Rao, Maneesh Agrawala; "Adding Conditional Control to Text-to-Image Diffusion Models." Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 3836-3847. https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_Adding_Conditional_Control_to_Text-to-Image_Diffusion_Models_ICCV_2023_paper.pdf
53
 
54
+ ## Model Architecture:
55
+ **Architecture Type:** Diffusion Model<br>
56
+ **Network Architecture:** 3D UNet + attention blocks (latent diffusion)<br>
57
+ **Task:** Generation (Synthetic MRI Image)<br>
58
  **Number of model parameters:** 240M
59
 
60
+ ## Computational Load (Internal Only: For NVIDIA Models Only)
61
+ **Cumulative Compute:** 4.375 x 10^22 <br>
62
+ **Estimated Energy and Emissions for Model Training:** 24,085kWh <br>
63
+
64
  ## Input:
65
+ **Input Type(s):** Integer, Array<br>
66
+ **Input Format(s):** Integer values, Float arrays<br>
67
+ **Input Parameters:** Number of Samples (1D), Modality (1D), Output Size (1D), and Spacing (1D)<br>
68
+ **Other Properties Related to Input:** Supports controllable synthetic brain MRI generation with modality selection, customizable output dimensions, and configurable voxel spacing.
69
 
70
  ### num_output_samples
71
  - **Type:** Integer
72
+ - **Description:** Required input indicates the number of synthetic brain MRI images the model will generate
73
 
74
+ ### modality
75
+ - **Type:** Integer label
76
+ - **Description:** Required input specifying the MRI modality to generate
77
+ - **Options:**
78
+ - T1
79
+ - FLAIR
80
+ - T2
81
+ - SWI
 
 
 
 
 
82
 
83
  ### output_size
84
  - **Type:** Array of 3 Integers
85
+ - **Description:** Optional specification of x, y, and z dimensions of the brain MRI image
86
+ - **Constraints:** Max dimension is 512x512x256
87
 
88
  ### spacing
89
  - **Type:** Array of 3 Floats
90
  - **Description:** Optional voxel spacing specification
91
+ - **Range:** 0.4x0.4x0.6mm
92
 
93
  ## Output:
94
+ **Output Type(s):** Image<br>
95
+ **Output Format:** Neuroimaging Informatics Technology Initiative (NIfTI)<br>
96
+ **Output Parameters:** Three-Dimensional (3D)<br>
97
+ **Other Properties Related to Output:** Synthetic brain MRI images in the specified modality (T1, FLAIR, T2, or SWI). Output dimensions and spacing are configurable within supported ranges.
98
 
99
+ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
100
 
101
  ## Software Integration:
102
  **Runtime Engine(s):**
103
+ * MONAI Core v.1.5
104
 
105
  **Supported Hardware Microarchitecture Compatibility:**
106
  * NVIDIA Ampere
107
  * NVIDIA Hopper
108
+ * NVIDIA Blackwell
109
 
110
  **Supported Operating System(s):**
111
  * Linux
 
113
  The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
114
 
115
  ## Model Version(s):
116
+ 0.1 - Initial release version for synthetic brain MRI image generation
117
 
118
  ## Training, Testing, and Evaluation Datasets:
119
 
120
  ### Dataset Overview:
121
+ The model was trained on the MR-Rate brain MRI dataset covering the supported modalities (T1, FLAIR, T2, and SWI). Data from multiple scanner types were processed to create high-quality 3D MRI volumes with corresponding anatomical annotations. The data processing pipeline ensured consistent voxel spacing, standardized orientations, and validated anatomical segmentations.
 
 
 
122
 
123
  ## Training Dataset:
124
  **Data Modality:**
125
+ * Image (Brain MRI — T1, FLAIR, T2, and SWI)
126
 
127
  **Image Training Data Size:**
128
  * Less than a Million Images
 
133
  **Labeling Method by dataset:**
134
  * Hybrid: Human, Automatic/Sensors
135
 
136
+ **Properties:** Approximately 28,000 MRI scans of various types (T1, FLAIR, T2, and SWI).
137
+
138
  ## Testing Dataset:
139
+ **Data Modality:**
140
+ * Image (Brain MRI — T1, FLAIR, T2, and SWI)
141
+
142
+ **Image Training Data Size:**
143
+ * Less than a Million Images
144
+
145
  **Data Collection Method by dataset:**
146
  * Hybrid: Human, Automatic/Sensors
147
 
148
  **Labeling Method by dataset:**
149
  * Hybrid: Human, Automatic/Sensors
150
 
151
+ **Properties:** Approximately 8,000 MRI scans of various types (T1, FLAIR, T2, and SWI).
152
+
153
  ## Evaluation Dataset:
154
+ **Data Modality:**
155
+ * Image (Brain MRI — T1, FLAIR, T2, and SWI)
156
+
157
+ **Image Training Data Size:**
158
+ * Less than a Million Images
159
 
160
  **Data Collection Method by dataset:**
161
  * Hybrid: Human, Automatic/Sensors
 
163
  **Labeling Method by dataset:**
164
  * Hybrid: Human, Automatic/Sensors
165
 
166
+ **Properties:** Approximately 4,000 MRI scans of various types (T1, FLAIR, T2, and SWI).
167
+
168
+ # Inference:
169
+ **Acceleration Engine:** PyTorch <br>
170
+ **Test Hardware:** <br>
171
  * A100
172
  * H100
173
 
174
  ## Ethical Considerations:
175
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
176
+
177
+ For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards.
178
+
179
+ Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
180
 
181
  Please report model quality, risk, security vulnerabilities or concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).