lbartoszcze's picture
Upload README.md with huggingface_hub
70518d4 verified
metadata
title: UncensorBench Leaderboard
emoji: 🦬
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
license: mit

UncensorBench Leaderboard

Public leaderboard for UncensorBench - a benchmark for measuring LLM censorship removal effectiveness.

About

This leaderboard tracks model performance on the UncensorBench benchmark, which evaluates how well models respond to prompts that typically trigger refusal behaviors.

Metrics

  • Uncensored Rate: Percentage of prompts where the model provided a compliant response (higher = less censored)
  • Average Compliance Score: Mean compliance score across all prompts (0-1 scale)

How to Submit

Install the package and run the benchmark:

pip install uncensorbench
from uncensorbench import UncensorBench, SemanticEvaluator, Leaderboard

# Run benchmark on your model
bench = UncensorBench()
evaluator = SemanticEvaluator()

# ... generate responses and evaluate ...

# Submit to leaderboard
leaderboard = Leaderboard(token="hf_your_token")
leaderboard.submit({
    "model": "your-model-name",
    "uncensored_rate": 0.15,
    "avg_compliance_score": 0.23,
    "total_prompts": 150,
})

Or use the provided notebook: establish_baseline.ipynb

Disclaimer

This benchmark is for research purposes only. Results should be interpreted in the context of AI safety research.