caijie12138 commited on
Commit
8ff702a
·
verified ·
1 Parent(s): da775c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -24,7 +24,7 @@ We trained six models with architectures designed for scaling. The detailed hype
24
 
25
  #### Table 1: Detailed Hyper-parameters of Models for Loss Estimation
26
 
27
- | Name | # Para | BS | \\(n_layer \\) | d | \\(d_ffn \\) | \\(n_head \\) | \\(n_kv \\) |
28
  | :----- | :------------ | :-- | :------ | :---- | :---- | :----- | :--- |
29
  | 0.005B (S1) | 5,247,232 | 32 | 8 | 256 | 640 | 4 | 1 |
30
  | 0.03B (S2) | 31,470,080 | 32 | 12 | 512 | 1,280 | 8 | 2 |
 
24
 
25
  #### Table 1: Detailed Hyper-parameters of Models for Loss Estimation
26
 
27
+ | Name | # Para | BS | n_layer | d | d_ffn | n_head | n_kv |
28
  | :----- | :------------ | :-- | :------ | :---- | :---- | :----- | :--- |
29
  | 0.005B (S1) | 5,247,232 | 32 | 8 | 256 | 640 | 4 | 1 |
30
  | 0.03B (S2) | 31,470,080 | 32 | 12 | 512 | 1,280 | 8 | 2 |