File size: 2,287 Bytes
dcd84fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47d0890
dcd84fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47d0890
dcd84fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
license: mit
tags:
- pytorch
- diffusers
- unconditional-image-generation
- diffusion-models-class
- medical-imaging
- brain-mri
- multiple-sclerosis
---

# Brain MRI Synthesis with Latent Diffusion (from scratch)

This model is a diffusion-based model for unconditional image generation of **latent representations of brain MRI FLAIR slices**. The model is designed to synthesize high-resolution brain MRI images (256x256 pixels) through a Latent Diffusion process, leveraging a U-Net architecture with ResNet and Attention-based blocks.

## Training Details

- **Architecture:** Latent Diffusion Model (LDM)
- **Resolution:** Latent resolution of 32x32 to generate 256x256 final images
- **Dataset:** Lesion2D VH split (FLAIR MRI slices) (70% of the dataset)
- **Channels:** 4 (latents are multi-channel representations of the original images)
- **Epochs:** 100
- **Batch size:** 16
- **Optimizer:** AdamW with:
  - Learning Rate: `1.0e-4`
  - Betas: (0.95, 0.999)
  - Weight Decay: `1.0e-6`
  - Epsilon: `1.0e-8`
- **Scheduler:** Cosine with 500 warm-up steps
- **Gradient Accumulation:** 1 step
- **Mixed Precision:** No
- **Gradient Clipping:** Max norm of 1.0
- **Noise Scheduler:** Linear schedule with:
  - `num_train_timesteps`: 1000
  - `beta_start`: 0.0001
  - `beta_end`: 0.02
- **Hardware:** Trained on **NVIDIA GPUs** with a distributed dataloader using 12 workers.
- **Memory Consumption:** Approx. **2.5 GB** during training.

## U-Net Architecture
- **Down Blocks:** [DownBlock2D, DownBlock2D, DownBlock2D, DownBlock2D, AttnDownBlock2D, DownBlock2D]
- **Up Blocks:** [UpBlock2D, AttnUpBlock2D, UpBlock2D, UpBlock2D, UpBlock2D, UpBlock2D]
- **Layers per Block:** 2
- **Block Channels:** [128, 128, 256, 256, 512, 512]

The model is designed to learn a compressed representation of the brain MRI images at a latent level, making the synthesis process more memory-efficient while maintaining high fidelity.

## Usage
You can use the model directly with the `diffusers` library:

```python
from diffusers import LatentDiffusionPipeline
import torch

# Load the model
pipeline = LatentDiffusionPipeline.from_pretrained("benetraco/latent_scratch")
pipeline.to("cuda")  # or "cpu"

# Generate an image
image = pipeline(batch_size=1).images[0]

# Display the image
image.show()