Spaces:
Runtime error
Runtime error
Update src/display/about.py
Browse files- src/display/about.py +1 -1
src/display/about.py
CHANGED
|
@@ -15,7 +15,7 @@ With the plethora of large language models (LLMs) and chatbots being released we
|
|
| 15 |
|
| 16 |
## How it works
|
| 17 |
|
| 18 |
-
π We evaluate models on
|
| 19 |
|
| 20 |
- <a href="https://arxiv.org/abs/1803.05457" target="_blank"> AI2 Reasoning Challenge </a> (25-shot) - a set of grade-school science questions.
|
| 21 |
- <a href="https://arxiv.org/abs/1905.07830" target="_blank"> HellaSwag </a> (10-shot) - a test of commonsense inference, which is easy for humans (~95%) but challenging for SOTA models.
|
|
|
|
| 15 |
|
| 16 |
## How it works
|
| 17 |
|
| 18 |
+
π We evaluate models on 6 key benchmarks using the <a href="https://github.com/EleutherAI/lm-evaluation-harness" target="_blank"> Eleuther AI Language Model Evaluation Harness </a>, a unified framework to test generative language models on a large number of different evaluation tasks.
|
| 19 |
|
| 20 |
- <a href="https://arxiv.org/abs/1803.05457" target="_blank"> AI2 Reasoning Challenge </a> (25-shot) - a set of grade-school science questions.
|
| 21 |
- <a href="https://arxiv.org/abs/1905.07830" target="_blank"> HellaSwag </a> (10-shot) - a test of commonsense inference, which is easy for humans (~95%) but challenging for SOTA models.
|