Spaces:

logikon
/

open_cot_leaderboard

Running on CPU Upgrade

Gregor Betz commited on Jan 30, 2024

Commit

992caee

unverified ·

1 Parent(s): 0841987

description

Files changed (1) hide show

src/display/about.py CHANGED Viewed

@@ -44,7 +44,7 @@ To assess the reasoning skill of a given `model`, we carry out the following ste
 3. `model` answers the test dataset problems _with the reasoning traces appended_ to the prompt, we record the resulting _CoT accuracy_.
 4. We compute the _accuracy gain Δ_ = _CoT accuracy_ — _baseline accuracy_ for the given `model`, `task`, and `regime`.
-Each `regime` has a different _accuracy gain Δ_, and the leaderboard reports (for every `model`/`task`) the best Δ achieved by any regime. All models are evaluated with the same set of regimes.
 ## How is it different from other leaderboards?

 3. `model` answers the test dataset problems _with the reasoning traces appended_ to the prompt, we record the resulting _CoT accuracy_.
 4. We compute the _accuracy gain Δ_ = _CoT accuracy_ — _baseline accuracy_ for the given `model`, `task`, and `regime`.
+Each `regime` yields a different _accuracy gain Δ_, and the leaderboard reports (for every `model`/`task`) the best Δ achieved by any regime. All models are evaluated against the same set of regimes.
 ## How is it different from other leaderboards?