Spaces:
Sleeping
Sleeping
Ludwig Stumpp
commited on
Commit
Β·
f3a8621
1
Parent(s):
1d376a9
Remove links in table headers
Browse files
README.md
CHANGED
|
@@ -1,6 +1,7 @@
|
|
| 1 |
# π llm-leaderboard
|
| 2 |
|
| 3 |
A joint community effort to create one central leaderboard for LLMs. Contributions and corrections welcome!
|
|
|
|
| 4 |
|
| 5 |
## Interactive Dashboard
|
| 6 |
|
|
@@ -20,28 +21,28 @@ We are always happy for contributions! You can contribute by the following:
|
|
| 20 |
|
| 21 |
## Leaderboard
|
| 22 |
|
| 23 |
-
| Model Name |
|
| 24 |
-
| -------------------------------------------------------------------------------------- |
|
| 25 |
-
| [alpaca-13b](https://crfm.stanford.edu/2023/03/13/alpaca.html) | [1008](https://lmsys.org/blog/2023-05-03-arena/)
|
| 26 |
-
| [cerebras-gpt-7b](https://huggingface.co/cerebras/Cerebras-GPT-6.7B) |
|
| 27 |
-
| [cerebras-gpt-13b](https://huggingface.co/cerebras/Cerebras-GPT-13B) |
|
| 28 |
-
| [chatglm-6b](https://chatglm.cn/blog) | [985](https://lmsys.org/blog/2023-05-03-arena/)
|
| 29 |
-
| [dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) | [944](https://lmsys.org/blog/2023-05-03-arena/)
|
| 30 |
-
| [eleuther-pythia-7b](https://huggingface.co/EleutherAI/pythia-6.9b) |
|
| 31 |
-
| [eleuther-pythia-12b](https://huggingface.co/EleutherAI/pythia-12b) |
|
| 32 |
-
| [fastchat-t5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0) | [951](https://lmsys.org/blog/2023-05-03-arena/)
|
| 33 |
-
| [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) |
|
| 34 |
-
| [gptj-6b](https://huggingface.co/EleutherAI/gpt-j-6b) |
|
| 35 |
-
| [koala-13b](https://bair.berkeley.edu/blog/2023/04/03/koala/) | [1082](https://lmsys.org/blog/2023-05-03-arena/)
|
| 36 |
-
| [llama-7b](https://arxiv.org/abs/2302.13971) |
|
| 37 |
-
| [llama-13b](https://arxiv.org/abs/2302.13971) | [932](https://lmsys.org/blog/2023-05-03-arena/)
|
| 38 |
-
| [mpt-7b](https://huggingface.co/mosaicml/mpt-7b) |
|
| 39 |
-
| [oasst-pythia-12b](https://huggingface.co/OpenAssistant/pythia-12b-pre-v8-12.5k-steps) | [1065](https://lmsys.org/blog/2023-05-03-arena/)
|
| 40 |
-
| [opt-7b](https://huggingface.co/facebook/opt-6.7b) |
|
| 41 |
-
| [opt-13b](https://huggingface.co/facebook/opt-13b) |
|
| 42 |
-
| [stablelm-base-alpha-7b](https://huggingface.co/stabilityai/stablelm-base-alpha-7b) |
|
| 43 |
-
| [stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b) | [858](https://lmsys.org/blog/2023-05-03-arena/)
|
| 44 |
-
| [vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0) | [1169](https://lmsys.org/blog/2023-05-03-arena/)
|
| 45 |
|
| 46 |
## Benchmarks
|
| 47 |
|
|
|
|
| 1 |
# π llm-leaderboard
|
| 2 |
|
| 3 |
A joint community effort to create one central leaderboard for LLMs. Contributions and corrections welcome!
|
| 4 |
+
Sources for the numbers are
|
| 5 |
|
| 6 |
## Interactive Dashboard
|
| 7 |
|
|
|
|
| 21 |
|
| 22 |
## Leaderboard
|
| 23 |
|
| 24 |
+
| Model Name | Chatbot Arena Elo | LAMBADA (zero-shot) | TriviaQA (zero-shot) |
|
| 25 |
+
| -------------------------------------------------------------------------------------- | ------------------------------------------------ | --------------------------------------------- | --------------------------------------------- |
|
| 26 |
+
| [alpaca-13b](https://crfm.stanford.edu/2023/03/13/alpaca.html) | [1008](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
| 27 |
+
| [cerebras-gpt-7b](https://huggingface.co/cerebras/Cerebras-GPT-6.7B) | | [0.636](https://www.mosaicml.com/blog/mpt-7b) | [0.141](https://www.mosaicml.com/blog/mpt-7b) |
|
| 28 |
+
| [cerebras-gpt-13b](https://huggingface.co/cerebras/Cerebras-GPT-13B) | | [0.635](https://www.mosaicml.com/blog/mpt-7b) | [0.146](https://www.mosaicml.com/blog/mpt-7b) |
|
| 29 |
+
| [chatglm-6b](https://chatglm.cn/blog) | [985](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
| 30 |
+
| [dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) | [944](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
| 31 |
+
| [eleuther-pythia-7b](https://huggingface.co/EleutherAI/pythia-6.9b) | | [0.667](https://www.mosaicml.com/blog/mpt-7b) | [0.198](https://www.mosaicml.com/blog/mpt-7b) |
|
| 32 |
+
| [eleuther-pythia-12b](https://huggingface.co/EleutherAI/pythia-12b) | | [0.704](https://www.mosaicml.com/blog/mpt-7b) | [0.233](https://www.mosaicml.com/blog/mpt-7b) |
|
| 33 |
+
| [fastchat-t5-3b](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0) | [951](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
| 34 |
+
| [gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) | | [0.719](https://www.mosaicml.com/blog/mpt-7b) | [0.347](https://www.mosaicml.com/blog/mpt-7b) |
|
| 35 |
+
| [gptj-6b](https://huggingface.co/EleutherAI/gpt-j-6b) | | [0.683](https://www.mosaicml.com/blog/mpt-7b) | [0.234](https://www.mosaicml.com/blog/mpt-7b) |
|
| 36 |
+
| [koala-13b](https://bair.berkeley.edu/blog/2023/04/03/koala/) | [1082](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
| 37 |
+
| [llama-7b](https://arxiv.org/abs/2302.13971) | | [0.738](https://www.mosaicml.com/blog/mpt-7b) | [0.443](https://www.mosaicml.com/blog/mpt-7b) |
|
| 38 |
+
| [llama-13b](https://arxiv.org/abs/2302.13971) | [932](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
| 39 |
+
| [mpt-7b](https://huggingface.co/mosaicml/mpt-7b) | | [0.702](https://www.mosaicml.com/blog/mpt-7b) | [0.343](https://www.mosaicml.com/blog/mpt-7b) |
|
| 40 |
+
| [oasst-pythia-12b](https://huggingface.co/OpenAssistant/pythia-12b-pre-v8-12.5k-steps) | [1065](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
| 41 |
+
| [opt-7b](https://huggingface.co/facebook/opt-6.7b) | | [0.677](https://www.mosaicml.com/blog/mpt-7b) | [0.227](https://www.mosaicml.com/blog/mpt-7b) |
|
| 42 |
+
| [opt-13b](https://huggingface.co/facebook/opt-13b) | | [0.692](https://www.mosaicml.com/blog/mpt-7b) | [0.282](https://www.mosaicml.com/blog/mpt-7b) |
|
| 43 |
+
| [stablelm-base-alpha-7b](https://huggingface.co/stabilityai/stablelm-base-alpha-7b) | | [0.533](https://www.mosaicml.com/blog/mpt-7b) | [0.049](https://www.mosaicml.com/blog/mpt-7b) |
|
| 44 |
+
| [stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b) | [858](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
| 45 |
+
| [vicuna-13b](https://huggingface.co/lmsys/vicuna-13b-delta-v0) | [1169](https://lmsys.org/blog/2023-05-03-arena/) | | |
|
| 46 |
|
| 47 |
## Benchmarks
|
| 48 |
|