| title: BeigificationBench | |
| emoji: π | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 6.10.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # BeigificationBench | |
| An anonymous benchmark evaluating how large language models flatten and homogenize text during rewriting β a phenomenon we call **beigification**. | |
| ## What is Beigification? | |
| Beigification describes the tendency of LLMs to produce safe, bland, stylistically uniform rewrites that strip out the distinctive voice, specificity, and informational density of source texts. | |
| ## Metrics | |
| - **Lossiness** β NLI-weighted information loss (proposition loss + semantic distance + word deletion) | |
| - **Drift** β Model collapse indicator combining spiciness loss and centroid pull | |
| - **Spiciness** β 6-component measure of textual vividness (perplexity, lexical richness, rare word density, word specificity, vivid modifier ratio, voice score) | |
| - **NLI Retention** β Proportion of source propositions preserved in the rewrite | |
| ## Benchmark Design | |
| Single-hop results are averaged across 3 independent replicates to reduce variance. | |
| Multi-hop results show degradation trajectories over 8 successive rewrites. | |
| Submitted for anonymous peer review. | |