Spaces:

logikon
/

open_cot_leaderboard

Running on CPU Upgrade

Gregor Betz commited on Jan 30, 2024

Commit

058891a

unverified ·

1 Parent(s): f621b6a

description

Files changed (1) hide show

src/display/about.py CHANGED Viewed

@@ -51,10 +51,13 @@ Each `regime` has a different _accuracy gain Δ_, and the leaderboard reports (f
 Performance leaderboards like the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according task performance.
 |🤗 Open LLM Leaderboard |`/\/` Open CoT Leaderboard |
 |---|---|
-|Can `model` solve task?|Does `model` do CoT to improve in task?|
-|Measures absolute performance.|Measures relative performance gains.|
 |Covers broad spectrum of `tasks`.|Focuses on critical thinking `tasks`.|

 Performance leaderboards like the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) or [YALL](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard) do a great job in ranking models according task performance.
+Unlike these leaderboards, the `/\/` Open CoT Leaderboard assess a model's ability to effectively reason about a `task`:
 |🤗 Open LLM Leaderboard |`/\/` Open CoT Leaderboard |
 |---|---|
+|Can `model` solve `task`?|Can `model` do CoT to improve in `task`?|
+|Measures `task` performance.|Measures ability to reason (about `task`).|
+|Metric: absolute accuracy.|Metric: relative accuracy gain.|
 |Covers broad spectrum of `tasks`.|Focuses on critical thinking `tasks`.|