Spaces:
Running
Running
| title: RefusalBench | |
| emoji: 🧬 | |
| colorFrom: red | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.50.0 | |
| app_file: app.py | |
| pinned: true | |
| license: mit | |
| tags: | |
| - benchmark | |
| - llm-evaluation | |
| - ai-safety | |
| - biosecurity | |
| - refusal | |
| - leaderboard | |
| datasets: | |
| - appliedscientific/refusalbench | |
| # RefusalBench | |
| Interactive leaderboard for the RefusalBench benchmark — a reproducible, evergreen evaluation of frontier LLM refusal on biological research prompts. | |
| **Paper:** [arXiv:2605.21545](https://arxiv.org/abs/2605.21545) | |
| **GitHub:** [AppliedScientific/refusalbench](https://github.com/AppliedScientific/refusalbench) | |