MewsicBench / about.py
CatsCanWrite's picture
all right, fix'd
2d61c75
TITLE = "# Mewsic Bench Leaderboard"
INTRODUCTION_TEXT = """
#### I run model evaluations via API on request.
To request a model evaluation, click **Request Evaluation** tab and enter the model ID.
"""
CITATION_BUTTON_TEXT = """
## What is this anyway?
I test the lyrical capabilities of LLMs by telling them to write songs entirely as meowing.
## WHY is this?
There's been a lot of grandstanding about models learning how to write poetry over the past few years, but I've found the actual generation of poetry to be a poor measure of their actual poetic competency, instead "faking it" by using common phrases and line structures that are close enough to fool a human reader.
By forcing the model to generate nonsensical content, it condenses the test down to the essential characteristics of meter and line.
Also, it's funny.
## Citation
If you use this benchmark, please cite:
```bibtex
@misc{mewsicbench2026,
title={Mewsic Bench},
author={CatsCanWrite},
year={2026},
}
```
## Contact
For questions or issues, please open a discussion on the Hugging Face community tab.
"""
METRIC_INFO_TEXT = """
## About the Metrics
- **Meter** - How closely the model sticks to the meter of the lines.
- **Verse** - How closely the model aligns the lines to the verse and chorus breakup.
- **Focus** - How much of the response is extraneous commentary instead of the song. *(Focus in particular has a very minor contribution to the final score)*
- **Thinking** - The **estimated average** number of thinking tokens per response. Zero means it's not a reasoning model (or is a hybrid model with reasoning off).
"""