Spaces:

lm-harmony
/

README

Running

App Files Files Community

README / README.md

ghzhang

update readme

9db27a5 about 2 months ago

|

history blame contribute delete

877 Bytes

	---
	title: README
	emoji: 🔥
	colorFrom: yellow
	colorTo: purple
	sdk: static
	pinned: false
	---

	![train-before-test](assets/banner.png)

	Which model would you rather have: the weaker student who crammed for the test, or the stronger student who walked in underprepared? Existing leaderboards mostly reward the former.

	LM-Harmony is a multi-task leaderboard for model potential. Instead of judging deployment-ready performance out of the box, we use a train-before-test paradigm: every model is fine-tuned on the same benchmark-specific training set before evaluation.

	Across diverse tasks, LM-Harmony yields far more stable and consistent rankings than standard direct-evaluation leaderboards. If you care about which model will perform better after you fine-tune it on your own data, the ranking you see here is much more likely to generalize to your workload.