Unconditional Image Generation
latent_diffusion
medical-imaging
diffusion
Can-Zhao commited on
Commit
621e89d
·
verified ·
1 Parent(s): 2f4df68

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +158 -0
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: nvidia-open-model-license-agreement
4
+ license_link: >-
5
+ https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
6
+ tags:
7
+ - medical-imaging
8
+ - diffusion
9
+ arxiv: 2409.11169
10
+ pipeline_tag: unconditional-image-generation
11
+ ---
12
+
13
+ # NV-Generate-MR-Brain Overview
14
+
15
+ ## Description:
16
+ NV-Generate-MR-Brain is a state-of-the-art three-dimensional (3D) latent diffusion model designed to generate high-quality synthetic magnetic resonance (MR) Brain images including while brain and skull-stripped brain. The model excels at data augmentation and at generating realistic medical imaging data to supplement datasets limited by privacy concerns or the rarity of certain conditions. It can also significantly enhance the performance of other medical imaging AI models by generating diverse, realistic training data.
17
+
18
+ ## Github Links:
19
+ Training and inference code are in:
20
+ [https://github.com/NVIDIA-Medtech/NV-Generate-CTMR/tree/main](https://github.com/NVIDIA-Medtech/NV-Generate-CTMR/tree/main).
21
+
22
+
23
+ ### Deployment Geography:
24
+ Global
25
+
26
+ ### Use Case:
27
+ Medical researchers, AI developers, and healthcare institutions would be expected to use this system for generating synthetic MR Brain training data, data augmentation for rare conditions, and advancing AI applications in healthcare research.
28
+
29
+ ## Download
30
+ For example, to download the VAE, you can run:
31
+ ```
32
+ pip install -U huggingface_hub
33
+ huggingface-cli download nvidia/NV-Generate-MR-Brain \
34
+ models/autoencoder_v1.pt \
35
+ --local-dir ./models
36
+ ```
37
+
38
+ ### Release Date:
39
+ Huggingface: 10/27/2025 via https://huggingface.co/NVIDIA
40
+
41
+ ## Reference(s):
42
+ [1] Zhao, Can, et al. "Maisi-v2: Accelerated 3d high-resolution medical image synthesis with rectified flow and region-specific contrastive loss." arXiv preprint arXiv:2508.05772 (2025).
43
+
44
+ [2] Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf
45
+
46
+ [3] Lvmin Zhang, Anyi Rao, Maneesh Agrawala; "Adding Conditional Control to Text-to-Image Diffusion Models." Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 3836-3847. https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_Adding_Conditional_Control_to_Text-to-Image_Diffusion_Models_ICCV_2023_paper.pdf
47
+
48
+ ## Model Architecture:
49
+ **Architecture Type:** Transformer
50
+ **Network Architecture:** 3D UNet + attention blocks
51
+
52
+ This model was developed from scratch using MONAI components.
53
+ **Number of model parameters:** 240M
54
+
55
+ ## Input:
56
+ **Input Type(s):** Integer, List, Array
57
+ **Input Format(s):** Integer values, String arrays, Float arrays
58
+ **Input Parameters:** Number of Samples (1D), Body Region (1D), Anatomy List (1D), Output Size (1D), and Spacing (1D)
59
+ **Other Properties Related to Input:** Supports controllable synthetic MR generation with flexible body region selection, optional anatomical class specification (up to 127 classes), customizable output dimensions, configurable voxel spacing (0.5-5.0mm), and controllable anatomy sizing.
60
+
61
+ ### num_output_samples
62
+ - **Type:** Integer
63
+ - **Description:** Required input indicates the number of synthetic images the model will generate
64
+
65
+ ### modality class
66
+ - **Type:** Integer
67
+ - **Description:** Required input indicates the contrast of generated, defined in https://github.com/NVIDIA-Medtech/NV-Generate-CTMR/blob/main/configs/modality_mapping.json
68
+ - "mri":8,
69
+ "mri_t1":9,
70
+ "mri_t2":10,
71
+ "mri_flair":11,
72
+ "mri_swi":20,
73
+ "mri_t1_skull_stripped":29,
74
+ "mri_t2_skull_stripped":30,
75
+ "mri_flair_skull_stripped":31,
76
+ "mri_swi_skull_stripped":32,
77
+ - **Options:** [8, 9, 10, 11, 20, 29, 30, 31, 32]
78
+
79
+ ### output_size
80
+ - **Type:** Array of 3 Integers
81
+ - **Description:** Optional specification of x, y, and z dimensions of MR image
82
+ - **Constraints:** Must be 128, 256, 384, or 512 for x- and y-axes; 128, 256, 384, 512, 640, or 768 for z-axis
83
+
84
+ ### spacing
85
+ - **Type:** Array of 3 Floats
86
+ - **Description:** Optional voxel spacing specification
87
+ - **Range:** 0.5mm to 5.0mm per element
88
+
89
+ ## Output:
90
+ **Output Type(s):** Image
91
+ **Output Format:** Neuroimaging Informatics Technology Initiative (NIfTI), Digital Imaging and Communications in Medicine (DICOM), Nearly Raw Raster Data (Nrrd)
92
+ **Output Parameters:** Three-Dimensional (3D)
93
+ **Other Properties Related to Output:** Synthetic MR brain images with dimensions up to 512×512×256 voxels and spacing between 0.4mm and 5.0mm, with controllable anatomy sizes as specified. When anatomy_list is provided, an additional NIfTI file containing the corresponding segmentation mask is generated.
94
+
95
+ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (GPU cores) and software frameworks (CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
96
+
97
+ ## Software Integration:
98
+ **Runtime Engine(s):**
99
+ * MONAI Core v.1.5.0
100
+
101
+ **Supported Hardware Microarchitecture Compatibility:**
102
+ * NVIDIA Ampere
103
+ * NVIDIA Hopper
104
+
105
+ **Supported Operating System(s):**
106
+ * Linux
107
+
108
+ The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
109
+
110
+ ## Model Version(s):
111
+ 0.1 - Initial release version for synthetic MR brain image generation
112
+
113
+ ## Training, Testing, and Evaluation Datasets:
114
+
115
+ ### Dataset Overview:
116
+ **Total Size:** ~38k subjects, including 265k whole brain and 265k skull-stripped brain
117
+ **Total Number of Datasets:** 1 datasets
118
+
119
+ Public datasets from multiple scanner types were processed to create high-quality 3D MR volumes with corresponding anatomical annotations. The data processing pipeline ensured consistent voxel spacing, standardized orientations, and validated anatomical segmentations.
120
+
121
+ ## Training Dataset:
122
+ **Data Modality:**
123
+ * Image
124
+
125
+ **Image Training Data Size:**
126
+ * Less than a Million Images
127
+
128
+ **Data Collection Method by dataset:**
129
+ * Hybrid: Human, Automatic/Sensors
130
+
131
+ **Labeling Method by dataset:**
132
+ * Hybrid: Human, Automatic/Sensors
133
+
134
+ ## Testing Dataset:
135
+ **Data Collection Method by dataset:**
136
+ * Hybrid: Human, Automatic/Sensors
137
+
138
+ **Labeling Method by dataset:**
139
+ * Hybrid: Human, Automatic/Sensors
140
+
141
+ ## Evaluation Dataset:
142
+
143
+ **Data Collection Method by dataset:**
144
+ * Hybrid: Human, Automatic/Sensors
145
+
146
+ **Labeling Method by dataset:**
147
+ * Hybrid: Human, Automatic/Sensors
148
+
149
+ ## Inference:
150
+ **Acceleration Engine:** PyTorch
151
+ **Test Hardware:**
152
+ * A100
153
+ * H100
154
+
155
+ ## Ethical Considerations:
156
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
157
+
158
+ Please report model quality, risk, security vulnerabilities or concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).