openbmb
/

DensingLaw-ScalingModels

Text Generation

reference-models

Model card Files Files and versions

caijie12138 commited on Jul 26, 2025

Commit

8ff702a

·

verified ·

1 Parent(s): da775c3

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ We trained six models with architectures designed for scaling. The detailed hype
 #### Table 1: Detailed Hyper-parameters of Models for Loss Estimation
-| Name   | # Para        | BS  | \\(n_layer \\） | d     | \\(d_ffn \\） | \\(n_head \\） | \\(n_kv \\） |
 | :----- | :------------ | :-- | :------ | :---- | :---- | :----- | :--- |
 | 0.005B (S1) | 5,247,232     | 32  | 8       | 256   | 640   | 4      | 1    |
 | 0.03B (S2)  | 31,470,080    | 32  | 12      | 512   | 1,280 | 8      | 2    |

 #### Table 1: Detailed Hyper-parameters of Models for Loss Estimation
+| Name   | # Para        | BS  | n_layer  | d     | d_ffn  | n_head  | n_kv  |
 | :----- | :------------ | :-- | :------ | :---- | :---- | :----- | :--- |
 | 0.005B (S1) | 5,247,232     | 32  | 8       | 256   | 640   | 4      | 1    |
 | 0.03B (S2)  | 31,470,080    | 32  | 12      | 512   | 1,280 | 8      | 2    |