Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Gregor Betz
commited on
description
Browse files- src/display/about.py +5 -2
src/display/about.py
CHANGED
|
@@ -51,10 +51,13 @@ Each `regime` has a different _accuracy gain Δ_, and the leaderboard reports (f
|
|
| 51 |
|
| 52 |
Performance leaderboards like the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according task performance.
|
| 53 |
|
|
|
|
|
|
|
| 54 |
|🤗 Open LLM Leaderboard |`/\/` Open CoT Leaderboard |
|
| 55 |
|---|---|
|
| 56 |
-
|Can `model` solve task
|
| 57 |
-
|Measures
|
|
|
|
| 58 |
|Covers broad spectrum of `tasks`.|Focuses on critical thinking `tasks`.|
|
| 59 |
|
| 60 |
|
|
|
|
| 51 |
|
| 52 |
Performance leaderboards like the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according task performance.
|
| 53 |
|
| 54 |
+
Unlike these leaderboards, the `/\/` Open CoT Leaderboard assess a model's ability to effectively reason about a `task`:
|
| 55 |
+
|
| 56 |
|🤗 Open LLM Leaderboard |`/\/` Open CoT Leaderboard |
|
| 57 |
|---|---|
|
| 58 |
+
|Can `model` solve `task`?|Can `model` do CoT to improve in `task`?|
|
| 59 |
+
|Measures `task` performance.|Measures ability to reason (about `task`).|
|
| 60 |
+
|Metric: absolute accuracy.|Metric: relative accuracy gain.|
|
| 61 |
|Covers broad spectrum of `tasks`.|Focuses on critical thinking `tasks`.|
|
| 62 |
|
| 63 |
|