Unconditional Image Generation
latent_diffusion
medical-imaging
diffusion
Can-Zhao commited on
Commit
bbcbbf6
·
2 Parent(s): d95a4c8 7455a2d

Merge branch 'main' of https://huggingface.co/nvidia/NV-Generate-MR

Browse files
Files changed (1) hide show
  1. README.md +166 -3
README.md CHANGED
@@ -1,3 +1,166 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ ---
4
+ # NV-Generate-MR Overview
5
+
6
+ ## Description:
7
+ NV-Generate-MR is a state-of-the-art three-dimensional (3D) latent diffusion model designed to generate high-quality synthetic magnetic resonance (MR) images with or without anatomical annotations. The model excels at data augmentation and at generating realistic medical imaging data to supplement datasets limited by privacy concerns or the rarity of certain conditions. It can also significantly enhance the performance of other medical imaging AI models by generating diverse, realistic training data.
8
+
9
+ This model is for research and development only.
10
+
11
+ ### License/Terms of Use:
12
+ NVIDIA OneWay Non-Commercial License for academic research purposes
13
+
14
+ ### Deployment Geography:
15
+ Global
16
+
17
+ ### Use Case:
18
+ Medical researchers, AI developers, and healthcare institutions would be expected to use this system for generating synthetic MR training data, data augmentation for rare conditions, and advancing AI applications in healthcare research.
19
+
20
+ ### Release Date:
21
+ Huggingface: 10/27/2025 via https://huggingface.co/NVIDIA
22
+
23
+ ## Reference(s):
24
+ [1] Guo, Pengfei, et al. "MAISI: Medical AI for Synthetic Imaging." arXiv preprint arXiv:2409.11169. 2024. https://arxiv.org/abs/2409.11169
25
+
26
+ [2] Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf
27
+
28
+ [3] Lvmin Zhang, Anyi Rao, Maneesh Agrawala; "Adding Conditional Control to Text-to-Image Diffusion Models." Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 3836-3847. https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_Adding_Conditional_Control_to_Text-to-Image_Diffusion_Models_ICCV_2023_paper.pdf
29
+
30
+ ## Model Architecture:
31
+ **Architecture Type:** Transformer
32
+ **Network Architecture:** 3D UNet + attention blocks
33
+
34
+ This model was developed from scratch using MONAI components.
35
+ **Number of model parameters:** 240M
36
+
37
+ ## Input:
38
+ **Input Type(s):** Integer, List, Array
39
+ **Input Format(s):** Integer values, String arrays, Float arrays
40
+ **Input Parameters:** Number of Samples (1D), Body Region (1D), Anatomy List (1D), Output Size (1D), and Spacing (1D)
41
+ **Other Properties Related to Input:** Supports controllable synthetic MR generation with flexible body region selection, optional anatomical class specification (up to 127 classes), customizable output dimensions, configurable voxel spacing (0.5-5.0mm), and controllable anatomy sizing.
42
+
43
+ ### num_output_samples
44
+ - **Type:** Integer
45
+ - **Description:** Required input indicates the number of synthetic images the model will generate
46
+
47
+ ### body_region
48
+ - **Type:** List of Strings
49
+ - **Description:** Required input indicates the region of body the generated MR will focus on
50
+ - **Options:** ["head", "chest", "thorax", "abdomen", "pelvis", "lower"]
51
+
52
+ ### anatomy_list
53
+ - **Type:** List of Strings
54
+ - **Description:** Optional list of up to 127 anatomical classes
55
+
56
+ ### output_size
57
+ - **Type:** Array of 3 Integers
58
+ - **Description:** Optional specification of x, y, and z dimensions of MR image
59
+ - **Constraints:** Must be 128, 256, 384, or 512 for x- and y-axes; 128, 256, 384, 512, 640, or 768 for z-axis
60
+
61
+ ### spacing
62
+ - **Type:** Array of 3 Floats
63
+ - **Description:** Optional voxel spacing specification
64
+ - **Range:** 0.5mm to 5.0mm per element
65
+
66
+ ## Output:
67
+ **Output Type(s):** Image
68
+ **Output Format:** Neuroimaging Informatics Technology Initiative (NIfTI), Digital Imaging and Communications in Medicine (DICOM), Nearly Raw Raster Data (Nrrd)
69
+ **Output Parameters:** Three-Dimensional (3D)
70
+ **Other Properties Related to Output:** Synthetic MR images with dimensions up to 512×512×768 voxels and spacing between 0.5mm and 5.0mm, with controllable anatomy sizes as specified. When anatomy_list is provided, an additional NIfTI file containing the corresponding segmentation mask is generated.
71
+
72
+ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (GPU cores) and software frameworks (CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
73
+
74
+ ## Software Integration:
75
+ **Runtime Engine(s):**
76
+ * MONAI Core v.1.5.0
77
+
78
+ **Supported Hardware Microarchitecture Compatibility:**
79
+ * NVIDIA Ampere
80
+ * NVIDIA Hopper
81
+
82
+ **Supported Operating System(s):**
83
+ * Linux
84
+
85
+ The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
86
+
87
+ ## Model Version(s):
88
+ 0.1 - Initial release version for synthetic MR image generation
89
+
90
+ ## Training, Testing, and Evaluation Datasets:
91
+
92
+ ### Dataset Overview:
93
+ **Total Size:** ~20,000 3D volumes
94
+ **Total Number of Datasets:** ~17 datasets
95
+
96
+ Public datasets from multiple scanner types were processed to create high-quality 3D MR volumes with corresponding anatomical annotations. The data processing pipeline ensured consistent voxel spacing, standardized orientations, and validated anatomical segmentations.
97
+
98
+ ## Training Dataset:
99
+ **Data Modality:**
100
+ * Image
101
+
102
+ **Image Training Data Size:**
103
+ * Less than a Million Images
104
+
105
+ **Data Collection Method by dataset:**
106
+ * Hybrid: Human, Automatic/Sensors
107
+
108
+ **Labeling Method by dataset:**
109
+ * Hybrid: Human, Automatic/Sensors
110
+
111
+ ## Testing Dataset:
112
+ **Data Collection Method by dataset:**
113
+ * Hybrid: Human, Automatic/Sensors
114
+
115
+ **Labeling Method by dataset:**
116
+ * Hybrid: Human, Automatic/Sensors
117
+
118
+ ## Evaluation Dataset:
119
+
120
+ **Data Collection Method by dataset:**
121
+ * Hybrid: Human, Automatic/Sensors
122
+
123
+ **Labeling Method by dataset:**
124
+ * Hybrid: Human, Automatic/Sensors
125
+
126
+ ## Inference:
127
+ **Acceleration Engine:** PyTorch
128
+ **Test Hardware:**
129
+ * A100
130
+ * H100
131
+
132
+ ## Additional Information:
133
+ ### Available Anatomical Classes (345+ total):
134
+ NV-Generate-MR supports comprehensive anatomical segmentation with the following categories:
135
+
136
+ **Core Organs and Systems:**
137
+ - **Abdominal organs:** liver (1), kidney (2), spleen (3), pancreas (4), gallbladder (10), stomach (12), bladder (15), colon (62)
138
+ - **Cardiovascular:** heart (115), aorta (6), inferior vena cava (7), superior vena cava (125), portal and splenic veins (17)
139
+ - **Respiratory:** lung (20), trachea (57), airway (132), individual lung lobes (28-32)
140
+ - **Neurological:** brain (22), spinal cord (121), complete brain structures (214-345)
141
+
142
+ **Skeletal System:**
143
+ - **Spine:** Complete vertebral column from C1-S1 (33-56, 127)
144
+ - **Thoracic:** Bilateral ribs 1-12 (63-86), sternum (122), costal cartilages (114)
145
+ - **Appendicular:** Bilateral long bones, joints, and extremities (87-96)
146
+
147
+ **Detailed Brain Segmentation:**
148
+ Comprehensive brain parcellation including ventricles, cortical regions, subcortical structures, and specialized brain areas (214-345) based on neuroanatomical atlases.
149
+
150
+ **Pathological Structures:**
151
+ - **Tumors:** lung tumor (23), pancreatic tumor (24), hepatic tumor (26), brain tumor (176)
152
+ - **Cancer:** colon cancer primaries (27)
153
+ - **Lesions:** bone lesion (128)
154
+ - **Cysts:** bilateral kidney cysts (116-117)
155
+
156
+ **Specialized Regions:**
157
+ - **Head and neck:** detailed facial structures, sensory organs, and cranial anatomy (172-213)
158
+ - **Cardiac:** heart chambers, major vessels, and cardiac-specific structures (108, 149-155)
159
+ - **Reproductive:** prostate zones (118, 147-148), uterocervix (161), gonads (160)
160
+
161
+ *Complete numerical mapping and deprecated classes available in model documentation.*
162
+
163
+ ## Ethical Considerations:
164
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
165
+
166
+ Please report model quality, risk, security vulnerabilities or concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).