typo
Browse files
README.md
CHANGED
|
@@ -322,7 +322,7 @@ It is completely unknown if the architecture is beneficial for larger models (1B
|
|
| 322 |
- 41 layers
|
| 323 |
- 20 with shared W1 and W2
|
| 324 |
- 1 unique
|
| 325 |
-
- 1024
|
| 326 |
- 16 GQA heads, 4 KV heads (4:1)
|
| 327 |
- vocab size 32k
|
| 328 |
- RoPE + WoRPE + G²LU
|
|
|
|
| 322 |
- 41 layers
|
| 323 |
- 20 with shared W1 and W2
|
| 324 |
- 1 unique
|
| 325 |
+
- 1024 dims
|
| 326 |
- 16 GQA heads, 4 KV heads (4:1)
|
| 327 |
- vocab size 32k
|
| 328 |
- RoPE + WoRPE + G²LU
|