|
|
--- |
|
|
title: UncensorBench Leaderboard |
|
|
emoji: 🦬 |
|
|
colorFrom: green |
|
|
colorTo: blue |
|
|
sdk: docker |
|
|
pinned: false |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# UncensorBench Leaderboard |
|
|
|
|
|
Public leaderboard for [UncensorBench](https://github.com/wisent-ai/uncensorbench) - a benchmark for measuring LLM censorship removal effectiveness. |
|
|
|
|
|
## About |
|
|
|
|
|
This leaderboard tracks model performance on the UncensorBench benchmark, which evaluates how well models respond to prompts that typically trigger refusal behaviors. |
|
|
|
|
|
## Metrics |
|
|
|
|
|
- **Uncensored Rate**: Percentage of prompts where the model provided a compliant response (higher = less censored) |
|
|
- **Average Compliance Score**: Mean compliance score across all prompts (0-1 scale) |
|
|
|
|
|
## How to Submit |
|
|
|
|
|
Install the package and run the benchmark: |
|
|
|
|
|
```bash |
|
|
pip install uncensorbench |
|
|
``` |
|
|
|
|
|
```python |
|
|
from uncensorbench import UncensorBench, SemanticEvaluator, Leaderboard |
|
|
|
|
|
# Run benchmark on your model |
|
|
bench = UncensorBench() |
|
|
evaluator = SemanticEvaluator() |
|
|
|
|
|
# ... generate responses and evaluate ... |
|
|
|
|
|
# Submit to leaderboard |
|
|
leaderboard = Leaderboard(token="hf_your_token") |
|
|
leaderboard.submit({ |
|
|
"model": "your-model-name", |
|
|
"uncensored_rate": 0.15, |
|
|
"avg_compliance_score": 0.23, |
|
|
"total_prompts": 150, |
|
|
}) |
|
|
``` |
|
|
|
|
|
Or use the provided notebook: [establish_baseline.ipynb](https://github.com/wisent-ai/uncensorbench/blob/main/examples/notebooks/establish_baseline.ipynb) |
|
|
|
|
|
## Disclaimer |
|
|
|
|
|
This benchmark is for research purposes only. Results should be interpreted in the context of AI safety research. |
|
|
|