lbartoszcze's picture
Upload README.md with huggingface_hub
70518d4 verified
---
title: UncensorBench Leaderboard
emoji: 🦬
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
license: mit
---
# UncensorBench Leaderboard
Public leaderboard for [UncensorBench](https://github.com/wisent-ai/uncensorbench) - a benchmark for measuring LLM censorship removal effectiveness.
## About
This leaderboard tracks model performance on the UncensorBench benchmark, which evaluates how well models respond to prompts that typically trigger refusal behaviors.
## Metrics
- **Uncensored Rate**: Percentage of prompts where the model provided a compliant response (higher = less censored)
- **Average Compliance Score**: Mean compliance score across all prompts (0-1 scale)
## How to Submit
Install the package and run the benchmark:
```bash
pip install uncensorbench
```
```python
from uncensorbench import UncensorBench, SemanticEvaluator, Leaderboard
# Run benchmark on your model
bench = UncensorBench()
evaluator = SemanticEvaluator()
# ... generate responses and evaluate ...
# Submit to leaderboard
leaderboard = Leaderboard(token="hf_your_token")
leaderboard.submit({
"model": "your-model-name",
"uncensored_rate": 0.15,
"avg_compliance_score": 0.23,
"total_prompts": 150,
})
```
Or use the provided notebook: [establish_baseline.ipynb](https://github.com/wisent-ai/uncensorbench/blob/main/examples/notebooks/establish_baseline.ipynb)
## Disclaimer
This benchmark is for research purposes only. Results should be interpreted in the context of AI safety research.