Spaces:

logikon
/

open_cot_leaderboard

Running on CPU Upgrade

Gregor Betz commited on Mar 25, 2024

Commit

c91d7f4

unverified ·

1 Parent(s): 5b98e6a

update readme and about

Files changed (2) hide show

README.md CHANGED Viewed

@@ -8,7 +8,15 @@ sdk_version: 4.4.0
 app_file: app.py
 pinned: true
 license: apache-2.0
----
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 app_file: app.py
 pinned: true
 license: apache-2.0
+duplicated_from: logikon/open_cot_leaderboard
+fullWidth: true
+space_ci:
+  private: true
+  secrets:
+    - HF_TOKEN
+tags:
+  - leaderboard
+short_description: Track, rank and evaluate open LLMs' CoT quality---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

src/display/about.py CHANGED Viewed

@@ -53,18 +53,6 @@ Performance leaderboards like the [🤗 Open LLM Leaderboard](https://huggingfac
 Unlike these leaderboards, the `/\/` Open CoT Leaderboard assesses a model's ability to effectively reason about a `task`:
-### 🤗 Open LLM Leaderboard vs. `/\/` Open CoT Leaderboard
-*   🤗: Can `model` solve `task`?
-    `/\/`: Can `model` do CoT to improve in `task`?
-*   🤗: Metric: absolute accuracy.
-    `/\/`: Metric: relative accuracy gain.
-*   🤗: Measures `task` performance.
-    `/\/`: Measures ability to reason (about `task`).
-*   🤗: Covers broad spectrum of `tasks`.
-    `/\/`: Focuses on critical thinking `tasks`.
 ### 🤗 Open LLM Leaderboard
 * a. Can `model` solve `task`?
 * b. Metric: absolute accuracy.
@@ -84,7 +72,10 @@ The test dataset porblems in the CoT Leaderboard can be solved through clear thi
 ## Reproducibility
-To reproduce our results, check out the repository [cot-eval](https://github.com/logikon-ai/cot-eval).
 """

 Unlike these leaderboards, the `/\/` Open CoT Leaderboard assesses a model's ability to effectively reason about a `task`:
 ### 🤗 Open LLM Leaderboard
 * a. Can `model` solve `task`?
 * b. Metric: absolute accuracy.
 ## Reproducibility
+To learn more about the evaluation piepline and reproduce our results, check out the repository [cot-eval](https://github.com/logikon-ai/cot-eval).
+## Acknowledgements
+We're grateful to community members for running evaluations and reporting results. To contribute, join us at [`cot-leaderboard`](https://huggingface.co/cot-leaderboard) organization.
 """