nvidia
/

NV-Generate-MR-Brain

@@ -13,104 +13,99 @@ pipeline_tag: unconditional-image-generation
 # NV-Generate-MR-Brain Overview
 ## Description:
-NV-Generate-MR-Brain is a state-of-the-art three-dimensional (3D) latent diffusion model designed to generate high-quality synthetic magnetic resonance (MR) Brain images with including
-- whole brain
-- skull-stripped brain
-It can generate brain images with contrast of
-- T1w
-- T2w
-- Flair
-- SWI
 The model excels at data augmentation and at generating realistic medical imaging data to supplement datasets limited by privacy concerns or the rarity of certain conditions. It can also significantly enhance the performance of other medical imaging AI models by generating diverse, realistic training data.
-## Github Links:
-Training and inference code are in:
-[https://github.com/NVIDIA-Medtech/NV-Generate-CTMR/tree/main](https://github.com/NVIDIA-Medtech/NV-Generate-CTMR/tree/main).
 ### Deployment Geography:
 Global
 ### Use Case:
-Medical researchers, AI developers, and healthcare institutions would be expected to use this system for generating synthetic MR Brain training data, data augmentation for rare conditions, and advancing AI applications in healthcare research.
-## Download
-For example, to download the VAE, you can run:
-```
-pip install -U huggingface_hub
-huggingface-cli download nvidia/NV-Generate-MR-Brain  \
-    models/diff_unet_3d_rflow-mr-brain_v0.pt \
-    --local-dir ./models
-```
 ### Release Date:
-Huggingface: 10/27/2025 via https://huggingface.co/NVIDIA
 ## Reference(s):
-[1] Zhao, Can, et al. "Maisi-v2: Accelerated 3d high-resolution medical image synthesis with rectified flow and region-specific contrastive loss." arXiv preprint arXiv:2508.05772 (2025).
-[2] Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf
-[3] Lvmin Zhang, Anyi Rao, Maneesh Agrawala; "Adding Conditional Control to Text-to-Image Diffusion Models." Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 3836-3847. https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_Adding_Conditional_Control_to_Text-to-Image_Diffusion_Models_ICCV_2023_paper.pdf
-## Model Architecture:
-**Architecture Type:** Transformer
-**Network Architecture:** 3D UNet + attention blocks
-This model was developed from scratch using MONAI components.
 **Number of model parameters:** 240M
 ## Input:
-**Input Type(s):** Integer, List, Array
-**Input Format(s):** Integer values, String arrays, Float arrays
-**Input Parameters:** Number of Samples (1D), Output Size (1D), and Spacing (1D)
-**Other Properties Related to Input:** Supports controllable synthetic MR generation with whole brain or skull-stripped brain selection, customizable output dimensions, configurable voxel spacing (0.4-5.0mm).
 ### num_output_samples
 - **Type:** Integer
-- **Description:** Required input indicates the number of synthetic images the model will generate
-### modality class
-- **Type:** Integer
-- **Description:** Required input indicates the contrast of generated, defined in https://github.com/NVIDIA-Medtech/NV-Generate-CTMR/blob/main/configs/modality_mapping.json
-  - "mri":8,
-    "mri_t1":9,
-    "mri_t2":10,
-    "mri_flair":11,
-    "mri_swi":20,
-    "mri_t1_skull_stripped":29,
-    "mri_t2_skull_stripped":30,
-    "mri_flair_skull_stripped":31,
-    "mri_swi_skull_stripped":32,
-- **Options:** [8, 9, 10, 11, 20, 29, 30, 31, 32]
 ### output_size
 - **Type:** Array of 3 Integers
-- **Description:** Optional specification of x, y, and z dimensions of MR image
-- **Constraints:** Must be 128, 256, 384, or 512 for x- and y-axes; 128, 256 for z-axis
 ### spacing
 - **Type:** Array of 3 Floats
 - **Description:** Optional voxel spacing specification
-- **Range:** 0.4mm to 5.0mm per element
 ## Output:
-**Output Type(s):** Image
-**Output Format:** Neuroimaging Informatics Technology Initiative (NIfTI), Digital Imaging and Communications in Medicine (DICOM), Nearly Raw Raster Data (Nrrd)
-**Output Parameters:** Three-Dimensional (3D)
-**Other Properties Related to Output:** Synthetic MR brain images with dimensions up to 512×512×256 voxels and spacing between 0.4mm and 5.0mm.
-Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (GPU cores) and software frameworks (CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
 ## Software Integration:
 **Runtime Engine(s):**
-* MONAI Core v.1.5.0
 **Supported Hardware Microarchitecture Compatibility:**
 * NVIDIA Ampere
 * NVIDIA Hopper
 **Supported Operating System(s):**
 * Linux
@@ -118,19 +113,16 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
 The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
 ## Model Version(s):
-0.1 - Initial release version for synthetic MR brain image generation
 ## Training, Testing, and Evaluation Datasets:
 ### Dataset Overview:
-**Total Size:** ~38k subjects, including 265k whole brain and 265k skull-stripped brain
-**Total Number of Datasets:** 1 datasets
-Public datasets from multiple scanner types were processed to create high-quality 3D MR brain volumes. The data processing pipeline ensured consistent voxel spacing, standardized orientations.
 ## Training Dataset:
 **Data Modality:**
-* Image
 **Image Training Data Size:**
 * Less than a Million Images
@@ -141,14 +133,29 @@ Public datasets from multiple scanner types were processed to create high-qualit
 **Labeling Method by dataset:**
 * Hybrid: Human, Automatic/Sensors
 ## Testing Dataset:
 **Data Collection Method by dataset:**
 * Hybrid: Human, Automatic/Sensors
 **Labeling Method by dataset:**
 * Hybrid: Human, Automatic/Sensors
 ## Evaluation Dataset:
 **Data Collection Method by dataset:**
 * Hybrid: Human, Automatic/Sensors
@@ -156,13 +163,19 @@ Public datasets from multiple scanner types were processed to create high-qualit
 **Labeling Method by dataset:**
 * Hybrid: Human, Automatic/Sensors
-## Inference:
-**Acceleration Engine:** PyTorch
-**Test Hardware:**
 * A100
 * H100
 ## Ethical Considerations:
-NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
 Please report model quality, risk, security vulnerabilities or concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

 # NV-Generate-MR-Brain Overview
 ## Description:
+NV-Generate-MR-Brain is a three-dimensional (3D) latent diffusion model designed to generate high-quality synthetic brain magnetic resonance imaging (MRI) images, achieving the highest resolution and best FID scores among comparable models. This model is specialized for brain MRI with support for multiple modalities: T1, Fluid Attenuated Inversion Recovery (FLAIR), T2, and Susceptibility Weighted Imaging (SWI).
+Compared to the previous NV-Generate-MR release, the key differences are:
+- **Resolution and image size:** Brain images are typically smaller than full-body MR; max dimension will be 512x512x256, resolution will be 0.4x0.4x0.6mm
+- **Supported modalities:** T1, FLAIR, T2, and SWI, selected via integer label input
+- **Cross-modality synthesis:** The primary use case is cross-modality synthesis (e.g., T1 → FLAIR, FLAIR → T1), enabling generation of complementary MRI modalities from existing data
 The model excels at data augmentation and at generating realistic medical imaging data to supplement datasets limited by privacy concerns or the rarity of certain conditions. It can also significantly enhance the performance of other medical imaging AI models by generating diverse, realistic training data.
+This model is ready for commercial use.
+### License/Terms of Use:
+Use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). Additional Information: [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
 ### Deployment Geography:
 Global
 ### Use Case:
+Medical researchers, AI developers, and healthcare institutions would be expected to use this model for:
+- **Cross-modality synthesis:** Generating complementary brain MRI modalities (e.g., synthesizing FLAIR from T1 or T1 from FLAIR)
+- **Synthetic training data generation:** Producing synthetic brain MRI images for data augmentation and AI model training
+- **Data augmentation for rare conditions:** Supplementing limited datasets where privacy or rarity restricts data availability
+It is not a clinically validated medical device and should not be used for clinical diagnostic purposes.
 ### Release Date:
+Huggingface: 03/16/2026 (GTC San Jose 2026) via https://huggingface.co/nvidia/NV-Generate-MR-brain
 ## Reference(s):
+[1] MAISI V2: https://arxiv.org/abs/2508.05772
+[2] Guo, Pengfei, et al. "MAISI: Medical AI for Synthetic Imaging." arXiv preprint arXiv:2409.11169. 2024. https://arxiv.org/abs/2409.11169
+[3] Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf
+[4] Lvmin Zhang, Anyi Rao, Maneesh Agrawala; "Adding Conditional Control to Text-to-Image Diffusion Models." Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 3836-3847. https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_Adding_Conditional_Control_to_Text-to-Image_Diffusion_Models_ICCV_2023_paper.pdf
+## Model Architecture:
+**Architecture Type:** Diffusion Model<br>
+**Network Architecture:** 3D UNet + attention blocks (latent diffusion)<br>
+**Task:** Generation (Synthetic MRI Image)<br>
 **Number of model parameters:** 240M
+## Computational Load (Internal Only: For NVIDIA Models Only)
+**Cumulative Compute:** 4.375 x 10^22 <br>
+**Estimated Energy and Emissions for Model Training:** 24,085kWh <br>
 ## Input:
+**Input Type(s):** Integer, Array<br>
+**Input Format(s):** Integer values, Float arrays<br>
+**Input Parameters:** Number of Samples (1D), Modality (1D), Output Size (1D), and Spacing (1D)<br>
+**Other Properties Related to Input:** Supports controllable synthetic brain MRI generation with modality selection, customizable output dimensions, and configurable voxel spacing.
 ### num_output_samples
 - **Type:** Integer
+- **Description:** Required input indicates the number of synthetic brain MRI images the model will generate
+### modality
+- **Type:** Integer label
+- **Description:** Required input specifying the MRI modality to generate
+- **Options:**
+  - T1
+  - FLAIR
+  - T2
+  - SWI
 ### output_size
 - **Type:** Array of 3 Integers
+- **Description:** Optional specification of x, y, and z dimensions of the brain MRI image
+- **Constraints:** Max dimension is 512x512x256
 ### spacing
 - **Type:** Array of 3 Floats
 - **Description:** Optional voxel spacing specification
+- **Range:** 0.4x0.4x0.6mm
 ## Output:
+**Output Type(s):** Image<br>
+**Output Format:** Neuroimaging Informatics Technology Initiative (NIfTI)<br>
+**Output Parameters:** Three-Dimensional (3D)<br>
+**Other Properties Related to Output:** Synthetic brain MRI images in the specified modality (T1, FLAIR, T2, or SWI). Output dimensions and spacing are configurable within supported ranges.
+Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
 ## Software Integration:
 **Runtime Engine(s):**
+* MONAI Core v.1.5
 **Supported Hardware Microarchitecture Compatibility:**
 * NVIDIA Ampere
 * NVIDIA Hopper
+* NVIDIA Blackwell
 **Supported Operating System(s):**
 * Linux
 The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
 ## Model Version(s):
+0.1 - Initial release version for synthetic brain MRI image generation
 ## Training, Testing, and Evaluation Datasets:
 ### Dataset Overview:
+The model was trained on the MR-Rate brain MRI dataset covering the supported modalities (T1, FLAIR, T2, and SWI). Data from multiple scanner types were processed to create high-quality 3D MRI volumes with corresponding anatomical annotations. The data processing pipeline ensured consistent voxel spacing, standardized orientations, and validated anatomical segmentations.
 ## Training Dataset:
 **Data Modality:**
+* Image (Brain MRI — T1, FLAIR, T2, and SWI)
 **Image Training Data Size:**
 * Less than a Million Images
 **Labeling Method by dataset:**
 * Hybrid: Human, Automatic/Sensors
+**Properties:** Approximately 28,000 MRI scans of various types (T1, FLAIR, T2, and SWI).
 ## Testing Dataset:
+**Data Modality:**
+* Image (Brain MRI — T1, FLAIR, T2, and SWI)
+**Image Training Data Size:**
+* Less than a Million Images
 **Data Collection Method by dataset:**
 * Hybrid: Human, Automatic/Sensors
 **Labeling Method by dataset:**
 * Hybrid: Human, Automatic/Sensors
+**Properties:** Approximately 8,000 MRI scans of various types (T1, FLAIR, T2, and SWI).
 ## Evaluation Dataset:
+**Data Modality:**
+* Image (Brain MRI — T1, FLAIR, T2, and SWI)
+**Image Training Data Size:**
+* Less than a Million Images
 **Data Collection Method by dataset:**
 * Hybrid: Human, Automatic/Sensors
 **Labeling Method by dataset:**
 * Hybrid: Human, Automatic/Sensors
+**Properties:** Approximately 4,000 MRI scans of various types (T1, FLAIR, T2, and SWI).
+# Inference:
+**Acceleration Engine:** PyTorch <br>
+**Test Hardware:** <br>
 * A100
 * H100
 ## Ethical Considerations:
+NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
+For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards.
+Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
 Please report model quality, risk, security vulnerabilities or concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).