Update README.md
Browse files
README.md
CHANGED
|
@@ -24,7 +24,7 @@ We trained six models with architectures designed for scaling. The detailed hype
|
|
| 24 |
|
| 25 |
#### Table 1: Detailed Hyper-parameters of Models for Loss Estimation
|
| 26 |
|
| 27 |
-
| Name | # Para | BS |
|
| 28 |
| :----- | :------------ | :-- | :------ | :---- | :---- | :----- | :--- |
|
| 29 |
| 0.005B (S1) | 5,247,232 | 32 | 8 | 256 | 640 | 4 | 1 |
|
| 30 |
| 0.03B (S2) | 31,470,080 | 32 | 12 | 512 | 1,280 | 8 | 2 |
|
|
|
|
| 24 |
|
| 25 |
#### Table 1: Detailed Hyper-parameters of Models for Loss Estimation
|
| 26 |
|
| 27 |
+
| Name | # Para | BS | n_layer | d | d_ffn | n_head | n_kv |
|
| 28 |
| :----- | :------------ | :-- | :------ | :---- | :---- | :----- | :--- |
|
| 29 |
| 0.005B (S1) | 5,247,232 | 32 | 8 | 256 | 640 | 4 | 1 |
|
| 30 |
| 0.03B (S2) | 31,470,080 | 32 | 12 | 512 | 1,280 | 8 | 2 |
|