bfuzzy1 commited on
Commit
46e510c
·
verified ·
1 Parent(s): 743f8c2

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -166,8 +166,7 @@ v9 confirmed the ~11M ceiling and that PLE was dead weight, but since it didn't
166
  base. From here the work moves to the capability stages (chat, reasoning).
167
 
168
  What the model is actually like: it holds up well for 11M on commonsense and science multiple-choice. SciQ
169
- (67.5) beats GPT-2-124M, and PIQA (56.0), ARC-Easy (35.6), HellaSwag (31.8), and COPA (55.0) are all clearly
170
- above random. Arithmetic has crept off the random floor (ArithMark 26.4) thanks to the folded-in computation
171
  data, though it's a modest lift and actually generating arithmetic is still weak. On the harder abstract
172
  reasoning tasks (Winogrande, CommonsenseQA, ARC-Challenge, OpenBookQA) and on open-ended generation it's near
173
  chance, partly a capacity ceiling at this size and partly loglikelihood length-bias. It's a solid base for
 
166
  base. From here the work moves to the capability stages (chat, reasoning).
167
 
168
  What the model is actually like: it holds up well for 11M on commonsense and science multiple-choice. SciQ
169
+ (67.5), PIQA (56.0), ARC-Easy (35.6), HellaSwag (31.8), and COPA (55.0) are all clearly above random. Arithmetic has crept off the random floor (ArithMark 26.4) thanks to the folded-in computation
 
170
  data, though it's a modest lift and actually generating arithmetic is still weak. On the harder abstract
171
  reasoning tasks (Winogrande, CommonsenseQA, ARC-Challenge, OpenBookQA) and on open-ended generation it's near
172
  chance, partly a capacity ceiling at this size and partly loglikelihood length-bias. It's a solid base for