Text Generation
English
xTimeCrystal commited on
Commit
f54d104
·
verified ·
1 Parent(s): 0e11ff4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -96,11 +96,11 @@ Use the code below to get started with the model.
96
 
97
  #### Training Hyperparameters
98
 
99
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
100
 
101
  #### Speeds, Sizes, Times [optional]
102
 
103
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
104
 
105
  [More Information Needed]
106
 
@@ -130,7 +130,8 @@ Use the code below to get started with the model.
130
 
131
  ### Results
132
 
133
- [More Information Needed]
 
134
 
135
  #### Summary
136
 
 
96
 
97
  #### Training Hyperparameters
98
 
99
+ - **Training regime:** bf16 non-mixed precision, used own version of Muon with lr from 5e-3 to 1e-3. <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
100
 
101
  #### Speeds, Sizes, Times [optional]
102
 
103
+ Throughput = infinite
104
 
105
  [More Information Needed]
106
 
 
130
 
131
  ### Results
132
 
133
+ Bits-per-byte: ~1
134
+ HellaSwag Accuracy: 33.4% (removed Wikihow entries)
135
 
136
  #### Summary
137