Update README.md
Browse files
README.md
CHANGED
|
@@ -18,4 +18,4 @@ tags:
|
|
| 18 |
- rwkv-v5-stp2-N8.pth : 3B rocm-rwkv model starting with the previous one + two epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.94 for N8.
|
| 19 |
- rwkv-v5-stp5-N8.pth : 3B rocm-rwkv model starting with the previous but now with 5 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.90 for N8.
|
| 20 |
- rwkv-v5-stp18-N8.pth : 3B rocm-rwkv model starting with the previous but now with 18 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.827 for N8 and 13.377 GTokens.
|
| 21 |
-
-
|
|
|
|
| 18 |
- rwkv-v5-stp2-N8.pth : 3B rocm-rwkv model starting with the previous one + two epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.94 for N8.
|
| 19 |
- rwkv-v5-stp5-N8.pth : 3B rocm-rwkv model starting with the previous but now with 5 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.90 for N8.
|
| 20 |
- rwkv-v5-stp18-N8.pth : 3B rocm-rwkv model starting with the previous but now with 18 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.827 for N8 and 13.377 GTokens.
|
| 21 |
+
- rwkv-v5-stp32-N8.pth : 3B rocm-rwkv model starting with the previous but now with 32 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.810 for N8 and 22.46 GTokens.
|