Spaces:
Sleeping
Sleeping
xeon27
commited on
Commit
·
159e996
1
Parent(s):
fcd47ae
[WIP] Add task link in description
Browse files- src/about.py +2 -2
src/about.py
CHANGED
|
@@ -44,14 +44,14 @@ TITLE = """<h1 align="center" id="space-title">LLM Evaluation Leaderboard</h1>""
|
|
| 44 |
|
| 45 |
# What does your leaderboard evaluate?
|
| 46 |
INTRODUCTION_TEXT = """
|
| 47 |
-
This leaderboard presents the performance of selected LLM models on a set of tasks. The tasks are divided into two categories: base and agentic. The base tasks are: [ARC-Easy](
|
| 48 |
"""
|
| 49 |
|
| 50 |
# Which evaluations are you running? how can people reproduce what you have?
|
| 51 |
LLM_BENCHMARKS_TEXT = f"""
|
| 52 |
## How it works
|
| 53 |
The following benchmarks are included:
|
| 54 |
-
Base: ARC-Easy, ARC-Challenge, DROP, WinoGrande, GSM8K, HellaSwag, HumanEval, IFEval, MATH, MMLU, MMLU-Pro, GPQA-Diamond
|
| 55 |
Agentic: GAIA, GDM-InterCode-CTF
|
| 56 |
|
| 57 |
## Reproducibility
|
|
|
|
| 44 |
|
| 45 |
# What does your leaderboard evaluate?
|
| 46 |
INTRODUCTION_TEXT = """
|
| 47 |
+
This leaderboard presents the performance of selected LLM models on a set of tasks. The tasks are divided into two categories: base and agentic. The base tasks are: [ARC-Easy](https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/arc), ARC-Challenge, DROP, WinoGrande, GSM8K, HellaSwag, HumanEval, IFEval, MATH, MMLU, MMLU-Pro, GPQA-Diamond. The agentic tasks are GAIA and GDM-InterCode-CTF.
|
| 48 |
"""
|
| 49 |
|
| 50 |
# Which evaluations are you running? how can people reproduce what you have?
|
| 51 |
LLM_BENCHMARKS_TEXT = f"""
|
| 52 |
## How it works
|
| 53 |
The following benchmarks are included:
|
| 54 |
+
Base: [ARC-Easy](https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/arc), ARC-Challenge, DROP, WinoGrande, GSM8K, HellaSwag, HumanEval, IFEval, MATH, MMLU, MMLU-Pro, GPQA-Diamond
|
| 55 |
Agentic: GAIA, GDM-InterCode-CTF
|
| 56 |
|
| 57 |
## Reproducibility
|