Spaces:
Running
Running
File size: 877 Bytes
58956bd bf9351b 9db27a5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ---
title: README
emoji: 🔥
colorFrom: yellow
colorTo: purple
sdk: static
pinned: false
---

*Which model would you rather have: the weaker student who crammed for the test, or the stronger student who walked in underprepared? Existing leaderboards mostly reward the former.*
**LM-Harmony** is a multi-task leaderboard for **model potential**. Instead of judging deployment-ready performance out of the box, we use a **train-before-test** paradigm: every model is fine-tuned on the same benchmark-specific training set before evaluation.
Across diverse tasks, LM-Harmony yields far more stable and consistent rankings than standard direct-evaluation leaderboards. If you care about which model will perform better after you fine-tune it on your own data, the ranking you see here is much more likely to generalize to your workload.
|