LLM360
/

K2

Text Generation

text-generation-inference

Model card Files Files and versions

victormiller commited on Jul 29, 2024

Commit

400af6c

·

verified ·

1 Parent(s): 4857ea8

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -34,9 +34,18 @@ Evaluations include standard best practice benchmarks, medical, math, and coding
 <center><img src="k2_table_of_tables.png" alt="k2 big eval table"/></center>
 Detailed analysis can be found on the K2 Weights and Biases project [here](https://wandb.ai/llm360/K2?nw=29mu6l0zzqq)
 ## K2 Gallery
 The K2 gallery allows one to browse the output of various prompts on intermediate K2 checkpoints, which provides an intuitive understanding on how the model develops and improves over time. This is inspired by The Bloom Book.

 <center><img src="k2_table_of_tables.png" alt="k2 big eval table"/></center>
 Detailed analysis can be found on the K2 Weights and Biases project [here](https://wandb.ai/llm360/K2?nw=29mu6l0zzqq)
+## Open LLM Leaderboard
+| Evaluation      | Score      | Raw Score      |
+| ----------- | ----------- | ----------- |
+| IFEval   | 22.52        | 23       |
+| BBH   | 28.22        | 50       |
+| Math Lvl 5   | 2.04        | 2       |
+| GPQA   | 3.58        | 28       |
+| MUSR   | 8.55        | 40       |
+| MMLU-PRO   | 22.27        | 30       |
+| Average   | 14.53        | 35.17       |
 ## K2 Gallery
 The K2 gallery allows one to browse the output of various prompts on intermediate K2 checkpoints, which provides an intuitive understanding on how the model develops and improves over time. This is inspired by The Bloom Book.