KrestGPT / report /chat-evaluation-mid.md
Rai220's picture
Upload folder using huggingface_hub
3448fa4 verified

Chat evaluation mid

timestamp: 2025-10-15 13:24:50

  • source: mid
  • task_name: None
  • dtype: bfloat16
  • temperature: 0.0000
  • max_new_tokens: 512
  • num_samples: 1
  • top_k: 50
  • batch_size: 8
  • model_tag: None
  • step: None
  • max_problems: None
  • ARC-Easy: 0.3906
  • ARC-Challenge: 0.2739
  • MMLU: 0.3094
  • GSM8K: 0.0273
  • HumanEval: 0.0671
  • ChatCORE metric: 0.0786