nanochat-cache / report /base-model-evaluation.md
ttj's picture
Add files using upload-large-folder tool
85a524c verified

Base model evaluation

timestamp: 2025-11-03 09:13:28

  • Model: base_model (step 21400)
  • CORE metric: 0.2137
  • hellaswag_zeroshot: 0.2687
  • jeopardy: 0.1214
  • bigbench_qa_wikidata: 0.5278
  • arc_easy: 0.5314
  • arc_challenge: 0.1251
  • copa: 0.3600
  • commonsense_qa: 0.1145
  • piqa: 0.3917
  • openbook_qa: 0.1360
  • lambada_openai: 0.3549
  • hellaswag: 0.2634
  • winograd: 0.2601
  • winogrande: 0.1018
  • bigbench_dyck_languages: 0.1080
  • agi_eval_lsat_ar: 0.1359
  • bigbench_cs_algorithms: 0.3720
  • bigbench_operators: 0.1429
  • bigbench_repeat_copy_logic: 0.0000
  • squad: 0.2528
  • coqa: 0.1932
  • boolq: -0.2369
  • bigbench_language_identification: 0.1762