Commit
·
1495b68
1
Parent(s):
68ef273
Update README.md
Browse files
README.md
CHANGED
|
@@ -89,7 +89,7 @@ image.save("fantasy_forest_illustration.png")
|
|
| 89 |
- [Sygil Diffusion v0.2](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.2.ckpt): Resumed from Sygil Diffusion v0.1 and trained for a total of 1.77 million steps.
|
| 90 |
- [Sygil Diffusion v0.3](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.3.ckpt): Resumed from Sygil Diffusion v0.2 and trained for a total of 2.01 million steps so far.
|
| 91 |
- #### Beta:
|
| 92 |
-
- [sygil-diffusion-v0.
|
| 93 |
|
| 94 |
Note: Checkpoints under the Beta section are updated daily or at least 3-4 times a week. This is usually the equivalent of 1-2 training session,
|
| 95 |
this is done until they are stable enough to be moved into a proper release, usually every 1 or 2 weeks.
|
|
@@ -105,14 +105,14 @@ The model was trained on the following dataset:
|
|
| 105 |
|
| 106 |
**Hardware and others**
|
| 107 |
- **Hardware:** 1 x Nvidia RTX 3050 8GB GPU
|
| 108 |
-
- **Hours Trained:**
|
| 109 |
- **Optimizer:** AdamW
|
| 110 |
- **Adam Beta 1**: 0.9
|
| 111 |
- **Adam Beta 2**: 0.999
|
| 112 |
- **Adam Weight Decay**: 0.01
|
| 113 |
- **Adam Epsilon**: 1e-8
|
| 114 |
- **Gradient Checkpointing**: True
|
| 115 |
-
- **Gradient Accumulations**:
|
| 116 |
- **Batch:** 1
|
| 117 |
- **Learning Rate:** 1e-7
|
| 118 |
- **Learning Rate Scheduler:** cosine_with_restarts
|
|
@@ -120,7 +120,15 @@ The model was trained on the following dataset:
|
|
| 120 |
- **Lora unet Learning Rate**: 1e-7
|
| 121 |
- **Lora Text Encoder Learning Rate**: 1e-7
|
| 122 |
- **Resolution**: 512 pixels
|
| 123 |
-
- **Total Training Steps:** 2,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 124 |
|
| 125 |
Developed by: [ZeroCool94](https://github.com/ZeroCool940711) at [Sygil-Dev](https://github.com/Sygil-Dev/)
|
| 126 |
|
|
|
|
| 89 |
- [Sygil Diffusion v0.2](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.2.ckpt): Resumed from Sygil Diffusion v0.1 and trained for a total of 1.77 million steps.
|
| 90 |
- [Sygil Diffusion v0.3](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.3.ckpt): Resumed from Sygil Diffusion v0.2 and trained for a total of 2.01 million steps so far.
|
| 91 |
- #### Beta:
|
| 92 |
+
- [sygil-diffusion-v0.4_2318263_lora.ckptt](https://huggingface.co/Sygil/Sygil-Diffusion/blob/main/sygil-diffusion-v0.4_2318263_lora.ckpt): Resumed from Sygil Diffusion v0.3 and trained for a total of 2.31 million steps so far.
|
| 93 |
|
| 94 |
Note: Checkpoints under the Beta section are updated daily or at least 3-4 times a week. This is usually the equivalent of 1-2 training session,
|
| 95 |
this is done until they are stable enough to be moved into a proper release, usually every 1 or 2 weeks.
|
|
|
|
| 105 |
|
| 106 |
**Hardware and others**
|
| 107 |
- **Hardware:** 1 x Nvidia RTX 3050 8GB GPU
|
| 108 |
+
- **Hours Trained:** 840 hours approximately.
|
| 109 |
- **Optimizer:** AdamW
|
| 110 |
- **Adam Beta 1**: 0.9
|
| 111 |
- **Adam Beta 2**: 0.999
|
| 112 |
- **Adam Weight Decay**: 0.01
|
| 113 |
- **Adam Epsilon**: 1e-8
|
| 114 |
- **Gradient Checkpointing**: True
|
| 115 |
+
- **Gradient Accumulations**: 400
|
| 116 |
- **Batch:** 1
|
| 117 |
- **Learning Rate:** 1e-7
|
| 118 |
- **Learning Rate Scheduler:** cosine_with_restarts
|
|
|
|
| 120 |
- **Lora unet Learning Rate**: 1e-7
|
| 121 |
- **Lora Text Encoder Learning Rate**: 1e-7
|
| 122 |
- **Resolution**: 512 pixels
|
| 123 |
+
- **Total Training Steps:** 2,318,263
|
| 124 |
+
|
| 125 |
+
|
| 126 |
+
Note: For the learning rate I'm testing something new, after changing from using the `constant` scheduler to `cosine_with_restarts` after v0.3 was released, I noticed
|
| 127 |
+
it practically uses the optimal learning rate while trying to minimize the loss value, so, when every training session finishes I use for the next session the latest
|
| 128 |
+
learning rate value shown for the last few steps from the last session, this makes it so it will overtime decrease at a constant rate. When I add a lot of data to the training dataset
|
| 129 |
+
at once, I move the learning rate back to 1e-7 which then the scheduler will move down again as it learns more from the new data, this makes it so the training
|
| 130 |
+
doesn't overfit or uses a learning rate too low that makes the model not learn anything new for a while.
|
| 131 |
+
|
| 132 |
|
| 133 |
Developed by: [ZeroCool94](https://github.com/ZeroCool940711) at [Sygil-Dev](https://github.com/Sygil-Dev/)
|
| 134 |
|