Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Gregor Betz
commited on
update about
Browse files- src/display/about.py +11 -0
src/display/about.py
CHANGED
|
@@ -54,6 +54,17 @@ Performance leaderboards like the [🤗 Open LLM Leaderboard](https://huggingfac
|
|
| 54 |
Unlike these leaderboards, the `/\/` Open CoT Leaderboard assesses a model's ability to effectively reason about a `task`:
|
| 55 |
|
| 56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
### 🤗 Open LLM Leaderboard
|
| 58 |
* a. Can `model` solve `task`?
|
| 59 |
* b. Metric: absolute accuracy.
|
|
|
|
| 54 |
Unlike these leaderboards, the `/\/` Open CoT Leaderboard assesses a model's ability to effectively reason about a `task`:
|
| 55 |
|
| 56 |
|
| 57 |
+
### 🤗 Open LLM Leaderboard vs. `/\/` Open CoT Leaderboard
|
| 58 |
+
* 🤗: Can `model` solve `task`?
|
| 59 |
+
`/\/`: Can `model` do CoT to improve in `task`?
|
| 60 |
+
* 🤗: Metric: absolute accuracy.
|
| 61 |
+
`/\/`: Metric: relative accuracy gain.
|
| 62 |
+
* 🤗: Measures `task` performance.
|
| 63 |
+
`/\/`: Measures ability to reason (about `task`).
|
| 64 |
+
* 🤗: Covers broad spectrum of `tasks`.
|
| 65 |
+
`/\/`: Focuses on critical thinking `tasks`.
|
| 66 |
+
|
| 67 |
+
|
| 68 |
### 🤗 Open LLM Leaderboard
|
| 69 |
* a. Can `model` solve `task`?
|
| 70 |
* b. Metric: absolute accuracy.
|