Llama-2-7B-physics / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
ace5084
|
raw
history blame
922 Bytes
metadata
datasets:
  - camel-ai/physics

Trained on a sample of camel-ai/physics dataset.

Base Model: NousResearch/Llama-2-7b-chat-hf

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 45.44
ARC (25-shot) 52.9
HellaSwag (10-shot) 77.71
MMLU (5-shot) 48.83
TruthfulQA (0-shot) 48.93
Winogrande (5-shot) 71.9
GSM8K (5-shot) 7.05
DROP (3-shot) 10.78