leaderboard-pr-bot's picture
Adding Evaluation Results
b657f29
|
raw
history blame
873 Bytes

#TODO card. A diff merge of ((GPT-J-Dolly - GPT-J-6b)*0.7 + Pygmalion-6b*0.9)

-Pygmalion-6b: https://huggingface.co/PygmalionAI/pygmalion-6b/ -Dolly-GPJ: https://huggingface.co/TehVenom/Dolly_GPT-J-6b

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 27.01
ARC (25-shot) 23.63
HellaSwag (10-shot) 34.38
MMLU (5-shot) 24.41
TruthfulQA (0-shot) 46.48
Winogrande (5-shot) 53.83
GSM8K (5-shot) 0.0
DROP (3-shot) 6.33