Spaces:
Sleeping
Sleeping
| TITLE = "# Mewsic Bench Leaderboard" | |
| INTRODUCTION_TEXT = """ | |
| #### I run model evaluations via API on request. | |
| To request a model evaluation, click **Request Evaluation** tab and enter the model ID. | |
| """ | |
| CITATION_BUTTON_TEXT = """ | |
| ## What is this anyway? | |
| I test the lyrical capabilities of LLMs by telling them to write songs entirely as meowing. | |
| ## WHY is this? | |
| There's been a lot of grandstanding about models learning how to write poetry over the past few years, but I've found the actual generation of poetry to be a poor measure of their actual poetic competency, instead "faking it" by using common phrases and line structures that are close enough to fool a human reader. | |
| By forcing the model to generate nonsensical content, it condenses the test down to the essential characteristics of meter and line. | |
| Also, it's funny. | |
| ## Citation | |
| If you use this benchmark, please cite: | |
| ```bibtex | |
| @misc{mewsicbench2026, | |
| title={Mewsic Bench}, | |
| author={CatsCanWrite}, | |
| year={2026}, | |
| } | |
| ``` | |
| ## Contact | |
| For questions or issues, please open a discussion on the Hugging Face community tab. | |
| """ | |
| METRIC_INFO_TEXT = """ | |
| ## About the Metrics | |
| - **Meter** - How closely the model sticks to the meter of the lines. | |
| - **Verse** - How closely the model aligns the lines to the verse and chorus breakup. | |
| - **Focus** - How much of the response is extraneous commentary instead of the song. *(Focus in particular has a very minor contribution to the final score)* | |
| - **Thinking** - The **estimated average** number of thinking tokens per response. Zero means it's not a reasoning model (or is a hybrid model with reasoning off). | |
| """ |