eval-leaderboard

Running

xeon27 commited on Jan 20

Commit

ba2f546

1 Parent(s): e004342

Add title and required text

Files changed (1) hide show

src/about.py CHANGED Viewed

@@ -38,20 +38,23 @@ NUM_FEWSHOT = 0 # Change with your few shot
 # Your leaderboard name
-TITLE = """<h1 align="center" id="space-title">Demo leaderboard</h1>"""
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
-Intro text
 """
 # Which evaluations are you running? how can people reproduce what you have?
 LLM_BENCHMARKS_TEXT = f"""
 ## How it works
 ## Reproducibility
 To reproduce our results, here is the commands you can run:
 """
 EVALUATION_QUEUE_TEXT = """

 # Your leaderboard name
+TITLE = """<h1 align="center" id="space-title">LLM Evaluation Leaderboard</h1>"""
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
+This leaderboard presents the performance of selected LLM models on a set of tasks. The tasks are divided into two categories: base and agentic. The base tasks are ARC-Easy, ARC-Challenge, DROP, WinoGrande, GSM8K, HellaSwag, HumanEval, IFEval, MATH, MMLU, MMLU-Pro, GPQA-Diamond. The agentic tasks are GAIA and GDM-InterCode-CTF.
 """
 # Which evaluations are you running? how can people reproduce what you have?
 LLM_BENCHMARKS_TEXT = f"""
 ## How it works
+The following benchmarks are included:
+Base: ARC-Easy, ARC-Challenge, DROP, WinoGrande, GSM8K, HellaSwag, HumanEval, IFEval, MATH, MMLU, MMLU-Pro, GPQA-Diamond
+Agentic: GAIA, GDM-InterCode-CTF
 ## Reproducibility
 To reproduce our results, here is the commands you can run:
+TBD
 """
 EVALUATION_QUEUE_TEXT = """