Performance on MMLU Astronomy

by meni12345 - opened Oct 29, 2023

Oct 29, 2023

•

edited Nov 8, 2023

Based on testing via LM Evaluation Harness it seems like this model is outperformed by the base version of Llama2 7B on MMLU Astronomy ("hendrycksTest-astronomy"). Is there a bug in the uploaded model?

hf-causal-experimental (pretrained=universeTBD/astrollama), limit: None, provide_description: False, num_fewshot: 0, batch_size: 4

Task	Version	Metric	Value		Stderr
hendrycksTest-astronomy	1	acc	0.3816	±	0.0395
		acc_norm	0.3816	±	0.0395

hf-causal-experimental (pretrained=meta-llama/Llama-2-7b-hf), limit: None, provide_description: False, num_fewshot: 0, batch_size: 4

Task	Version	Metric	Value		Stderr
hendrycksTest-astronomy	1	acc	0.4211	±	0.0402
		acc_norm	0.4211	±	0.0402

errai34

UniverseTBD org Nov 8, 2023

Hi @meni12345 , we haven't fine-tuned a chat version of the model, so no QA instruction was provided. We are currently in the process to do so and'll provide a chat version very soon. Thank you for testing our model!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment