y3i12 commited on
Commit
d8ea6f4
·
1 Parent(s): bc6c880
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -322,7 +322,7 @@ It is completely unknown if the architecture is beneficial for larger models (1B
322
  - 41 layers
323
  - 20 with shared W1 and W2
324
  - 1 unique
325
- - 1024 dimms
326
  - 16 GQA heads, 4 KV heads (4:1)
327
  - vocab size 32k
328
  - RoPE + WoRPE + G²LU
 
322
  - 41 layers
323
  - 20 with shared W1 and W2
324
  - 1 unique
325
+ - 1024 dims
326
  - 16 GQA heads, 4 KV heads (4:1)
327
  - vocab size 32k
328
  - RoPE + WoRPE + G²LU