Update README.md
Browse files
README.md
CHANGED
|
@@ -29,9 +29,9 @@ We trained six models with architectures designed for scaling. The detailed hype
|
|
| 29 |
| 0.005B (S1) | 5,247,232 | 32 | 8 | 256 | 640 | 4 | 1 |
|
| 30 |
| 0.03B (S2) | 31,470,080 | 32 | 12 | 512 | 1,280 | 8 | 2 |
|
| 31 |
| 0.1B (S3) | 106,196,736 | 64 | 18 | 768 | 1,920 | 12 | 3 |
|
| 32 |
-
| 0.2B (S4) | 245,416,960 | 128 | 24 | 1,024 | 2,560 |
|
| 33 |
-
| 0.4B (S5) | 476,852,480 | 256 | 30 | 1,280 | 3,200 |
|
| 34 |
-
| 0.8B (S6) | 828,225,024 | 512 | 36 | 1,536 | 3,840 |
|
| 35 |
|
| 36 |
### Training Data
|
| 37 |
|
|
|
|
| 29 |
| 0.005B (S1) | 5,247,232 | 32 | 8 | 256 | 640 | 4 | 1 |
|
| 30 |
| 0.03B (S2) | 31,470,080 | 32 | 12 | 512 | 1,280 | 8 | 2 |
|
| 31 |
| 0.1B (S3) | 106,196,736 | 64 | 18 | 768 | 1,920 | 12 | 3 |
|
| 32 |
+
| 0.2B (S4) | 245,416,960 | 128 | 24 | 1,024 | 2,560 | 16 | 2 |
|
| 33 |
+
| 0.4B (S5) | 476,852,480 | 256 | 30 | 1,280 | 3,200 | 20 | 2 |
|
| 34 |
+
| 0.8B (S6) | 828,225,024 | 512 | 36 | 1,536 | 3,840 | 24 | 3 |
|
| 35 |
|
| 36 |
### Training Data
|
| 37 |
|