nanochat-cache / base_eval /base_model_021400.csv
ttj's picture
Add files using upload-large-folder tool
bf80d19 verified
Task , Accuracy , Centered
hellaswag_zeroshot , 0.451504 , 0.268672
jeopardy , 0.121398 , 0.121398
bigbench_qa_wikidata , 0.527828 , 0.527828
arc_easy , 0.648569 , 0.531425
arc_challenge , 0.343857 , 0.125142
copa , 0.680000 , 0.360000
commonsense_qa , 0.291564 , 0.114455
piqa , 0.695865 , 0.391730
openbook_qa , 0.352000 , 0.136000
lambada_openai , 0.354939 , 0.354939
hellaswag , 0.447520 , 0.263361
winograd , 0.630037 , 0.260073
winogrande , 0.550908 , 0.101815
bigbench_dyck_languages , 0.108000 , 0.108000
agi_eval_lsat_ar , 0.308696 , 0.135870
bigbench_cs_algorithms , 0.371970 , 0.371970
bigbench_operators , 0.142857 , 0.142857
bigbench_repeat_copy_logic , 0.000000 , 0.000000
squad , 0.252791 , 0.252791
coqa , 0.193160 , 0.193160
boolq , 0.529969 , -0.236923
bigbench_language_identification , 0.251200 , 0.176238
CORE , , 0.213673