Update README.md
Browse files
README.md
CHANGED
|
@@ -10,4 +10,4 @@ Impressive performance for its size, however due to the small size, the model is
|
|
| 10 |
|
| 11 |
|
| 12 |
### Things to note
|
| 13 |
-
This model was trained at 12 epoch (which I thought 6 sufficient but I guess more is better for models < 3B) at 1e-4 learning rate with batch size of 8. One insight I found is that the model (raw safetensors, not quantised)
|
|
|
|
| 10 |
|
| 11 |
|
| 12 |
### Things to note
|
| 13 |
+
This model was trained at 12 epoch (which I thought 6 was sufficient but I guess more is better for models < 3B) at 1e-4 learning rate with batch size of 8. One insight I found is that the model (raw safetensors, not quantised) performs very well at 12 epochs albeit having some flaws due to dataset limitation and model capacity leading to saturated quality and output, so Im going to stick to this settings in future training
|