Update README.md
Browse files
README.md
CHANGED
|
@@ -96,11 +96,11 @@ Use the code below to get started with the model.
|
|
| 96 |
|
| 97 |
#### Training Hyperparameters
|
| 98 |
|
| 99 |
-
- **Training regime:**
|
| 100 |
|
| 101 |
#### Speeds, Sizes, Times [optional]
|
| 102 |
|
| 103 |
-
|
| 104 |
|
| 105 |
[More Information Needed]
|
| 106 |
|
|
@@ -130,7 +130,8 @@ Use the code below to get started with the model.
|
|
| 130 |
|
| 131 |
### Results
|
| 132 |
|
| 133 |
-
|
|
|
|
| 134 |
|
| 135 |
#### Summary
|
| 136 |
|
|
|
|
| 96 |
|
| 97 |
#### Training Hyperparameters
|
| 98 |
|
| 99 |
+
- **Training regime:** bf16 non-mixed precision, used own version of Muon with lr from 5e-3 to 1e-3. <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
| 100 |
|
| 101 |
#### Speeds, Sizes, Times [optional]
|
| 102 |
|
| 103 |
+
Throughput = infinite
|
| 104 |
|
| 105 |
[More Information Needed]
|
| 106 |
|
|
|
|
| 130 |
|
| 131 |
### Results
|
| 132 |
|
| 133 |
+
Bits-per-byte: ~1
|
| 134 |
+
HellaSwag Accuracy: 33.4% (removed Wikihow entries)
|
| 135 |
|
| 136 |
#### Summary
|
| 137 |
|