Update README.md
Browse files
README.md
CHANGED
|
@@ -11,7 +11,8 @@ Trained using Everydream2 Trainer (https://github.com/victorchall/EveryDream2tra
|
|
| 11 |
- Multi-aspect ratio trained with nominal size of <=768^2 pixels for each bucket
|
| 12 |
- Batch size 12 with grad accum 10.
|
| 13 |
- AdamW 8bit optimizer with standard betas of (0.9,0.999) and weight decay of 0.010.
|
| 14 |
-
-
|
|
|
|
| 15 |
- 3.0e-6 LR cosine schedule with a ~12 epoch target to decay, ending around 2.3e-6 at end of training
|
| 16 |
- Pyramid noise using discount 0.03
|
| 17 |
- Zero offset noise of 0.02
|
|
@@ -25,5 +26,5 @@ The following models were produced:
|
|
| 25 |
- 768_ts0huber_ts999mse.safetensors - Huber loss at timestep 0 interpolated to MSE loss at timestep 999
|
| 26 |
- 768_ts0mse_ts999huber.safetensors - MSE loss at timestep 0 interpolated to Huber loss at timestep 999
|
| 27 |
|
| 28 |
-
Worth noting timestep 0 is the lowest-noise-added step and 999 is most noised timestep
|
| 29 |
|
|
|
|
| 11 |
- Multi-aspect ratio trained with nominal size of <=768^2 pixels for each bucket
|
| 12 |
- Batch size 12 with grad accum 10.
|
| 13 |
- AdamW 8bit optimizer with standard betas of (0.9,0.999) and weight decay of 0.010.
|
| 14 |
+
- Automatic mixed precision FP16 (note: grad scalar val was surprisingly identical on all runs)
|
| 15 |
+
- TF32 matmul and SDP Attention
|
| 16 |
- 3.0e-6 LR cosine schedule with a ~12 epoch target to decay, ending around 2.3e-6 at end of training
|
| 17 |
- Pyramid noise using discount 0.03
|
| 18 |
- Zero offset noise of 0.02
|
|
|
|
| 26 |
- 768_ts0huber_ts999mse.safetensors - Huber loss at timestep 0 interpolated to MSE loss at timestep 999
|
| 27 |
- 768_ts0mse_ts999huber.safetensors - MSE loss at timestep 0 interpolated to Huber loss at timestep 999
|
| 28 |
|
| 29 |
+
Worth noting timestep 0 is the lowest-noise-added step and 999 is most noised timestep
|
| 30 |
|