Spaces:
Running
Running
ghzhang233 commited on
Commit ·
bf9351b
1
Parent(s): 3c0fc97
update readme
Browse files
README.md
CHANGED
|
@@ -1,10 +1,9 @@
|
|
| 1 |
-
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
Edit this `README.md` markdown file to author your organization card.
|
|
|
|
| 1 |
+
# LM-Harmony
|
| 2 |
+
|
| 3 |
+

|
| 4 |
+
|
| 5 |
+
*Which model would you rather have: the weaker student who crammed for the test, or the stronger student who walked in underprepared? Existing leaderboards mostly reward the former.*
|
| 6 |
+
|
| 7 |
+
**LM-Harmony** is a multi-task leaderboard for **model potential**. Instead of judging deployment-ready performance out of the box, we use a **train-before-test** paradigm: every model is fine-tuned on the same benchmark-specific training set before evaluation.
|
| 8 |
+
|
| 9 |
+
Across 24 diverse tasks, LM-Harmony yields far more stable and consistent rankings than standard direct-evaluation leaderboards. If you care about which model will perform better after you fine-tune it on your own data, the ranking you see here is much more likely to generalize to your workload.
|
|
|