Some traditional benchmarks?

#11

by pj-ml - opened Nov 7, 2023

Discussion

pj-ml

Nov 7, 2023

Could you add some well-known benchmarks?

PlanetDOGE

Nov 7, 2023

•

edited Nov 7, 2023

Yeah, I agree. There are no common benchmarks.

yinsong1986

Amazon org Nov 14, 2023

•

edited Nov 14, 2023 by

qsvga2

Yes, @pj-ml and @PlanetDOGE , we ran the traditional benchmarks as below, using the same methodology as the Open LLM Leaderboard:

Average	hellaswag	arc_challenge	truthful_qa (mc2)	MMLU (acc)
0.57221	0.81617	0.58874	0.38275	0.5012

Cheers!

pj-ml

Nov 14, 2023

Thanks! I would recommend adding it to the model card for visibility; then, I can close this comment out (as it would no longer be necessary for the visibility of the results you kindly shared).

yinsong1986

Amazon org Nov 15, 2023

Hi @pj-ml updated here https://huggingface.co/amazon/MistralLite/blob/main/README.md#mistrallite-lm-eval-results

Thank you!

pj-ml changed discussion status to closed Nov 17, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment