Reproduce the reported benchmark score using LM Harness

by SimonX - opened Mar 13, 2025

Mar 13, 2025

•

edited Apr 11, 2025

Is there anyone who can reproduce the reported benchmark score using LM Harness?
I am attempting to pull the model from HuggingFace and run the default settings of LM Harness (keeping the #shots consistent with the reported score). However, I am receiving accuracies that show a significant discrepancy compared to the reported ones.

peterzsj6

Jan 15

I have similar issues, where the pass@1 on humaneavl is around 0.06. I tried different top_p, top_K, and temperature.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment