TITLE = "# Mewsic Bench Leaderboard" INTRODUCTION_TEXT = """ #### I run model evaluations via API on request. To request a model evaluation, click **Request Evaluation** tab and enter the model ID. """ CITATION_BUTTON_TEXT = """ ## What is this anyway? I test the lyrical capabilities of LLMs by telling them to write songs entirely as meowing. ## WHY is this? There's been a lot of grandstanding about models learning how to write poetry over the past few years, but I've found the actual generation of poetry to be a poor measure of their actual poetic competency, instead "faking it" by using common phrases and line structures that are close enough to fool a human reader. By forcing the model to generate nonsensical content, it condenses the test down to the essential characteristics of meter and line. Also, it's funny. ## Citation If you use this benchmark, please cite: ```bibtex @misc{mewsicbench2026, title={Mewsic Bench}, author={CatsCanWrite}, year={2026}, } ``` ## Contact For questions or issues, please open a discussion on the Hugging Face community tab. """ METRIC_INFO_TEXT = """ ## About the Metrics - **Meter** - How closely the model sticks to the meter of the lines. - **Verse** - How closely the model aligns the lines to the verse and chorus breakup. - **Focus** - How much of the response is extraneous commentary instead of the song. *(Focus in particular has a very minor contribution to the final score)* - **Thinking** - The **estimated average** number of thinking tokens per response. Zero means it's not a reasoning model (or is a hybrid model with reasoning off). """