readme
Browse files
README.md
CHANGED
|
@@ -18,6 +18,7 @@ library_name: diffusers
|
|
| 18 |
|
| 19 |
## VAE Training Process
|
| 20 |
|
|
|
|
| 21 |
- Dataset: 100,000 PNG images
|
| 22 |
- Training Time: 4 days
|
| 23 |
- Hardware: Single RTX 4090
|
|
@@ -28,17 +29,16 @@ library_name: diffusers
|
|
| 28 |
|
| 29 |
## Implementation
|
| 30 |
|
| 31 |
-
Base Code: Used a simple diffusion model training script.
|
| 32 |
-
|
| 33 |
-
Training Target: Only the decoder, focusing on image reconstruction.
|
| 34 |
|
| 35 |
## Loss Functions
|
| 36 |
|
| 37 |
-
Initially used LPIPS and MSE.
|
| 38 |
-
Noticed FID score improving, but images becoming blurry (FID overfits to blurry images—improving FID is not always good).
|
| 39 |
-
Switched to MAE
|
| 40 |
-
Balanced LPIPS and MAE at 90/10 ratio.
|
| 41 |
-
Used median perceptual_loss_weight for better balance.
|
| 42 |
|
| 43 |
## Results
|
| 44 |
|
|
|
|
| 18 |
|
| 19 |
## VAE Training Process
|
| 20 |
|
| 21 |
+
- Encoder: Frozen (to avoid retraining SDXL for the new VAE).
|
| 22 |
- Dataset: 100,000 PNG images
|
| 23 |
- Training Time: 4 days
|
| 24 |
- Hardware: Single RTX 4090
|
|
|
|
| 29 |
|
| 30 |
## Implementation
|
| 31 |
|
| 32 |
+
- Base Code: Used a simple diffusion model training script.
|
| 33 |
+
- Training Target: Only the decoder, focusing on image reconstruction.
|
|
|
|
| 34 |
|
| 35 |
## Loss Functions
|
| 36 |
|
| 37 |
+
- Initially used LPIPS and MSE.
|
| 38 |
+
- Noticed FID score improving, but images becoming blurry (FID overfits to blurry images—improving FID is not always good).
|
| 39 |
+
- Switched to MAE.
|
| 40 |
+
- Balanced LPIPS and MAE at 90/10 ratio.
|
| 41 |
+
- Used median perceptual_loss_weight for better balance.
|
| 42 |
|
| 43 |
## Results
|
| 44 |
|