Spaces:

wisent-ai
/

UncensorBench-Leaderboard

Paused

UncensorBench-Leaderboard / README.md

Upload README.md with huggingface_hub

70518d4 verified about 2 months ago

1.52 kB

	---
	title: UncensorBench Leaderboard
	emoji: 🦬
	colorFrom: green
	colorTo: blue
	sdk: docker
	pinned: false
	license: mit
	---

	# UncensorBench Leaderboard

	Public leaderboard for [UncensorBench](https://github.com/wisent-ai/uncensorbench) - a benchmark for measuring LLM censorship removal effectiveness.

	## About

	This leaderboard tracks model performance on the UncensorBench benchmark, which evaluates how well models respond to prompts that typically trigger refusal behaviors.

	## Metrics

	- Uncensored Rate: Percentage of prompts where the model provided a compliant response (higher = less censored)
	- Average Compliance Score: Mean compliance score across all prompts (0-1 scale)

	## How to Submit

	Install the package and run the benchmark:

	```bash
	pip install uncensorbench
	```

	```python
	from uncensorbench import UncensorBench, SemanticEvaluator, Leaderboard

	# Run benchmark on your model
	bench = UncensorBench()
	evaluator = SemanticEvaluator()

	# ... generate responses and evaluate ...

	# Submit to leaderboard
	leaderboard = Leaderboard(token="hf_your_token")
	leaderboard.submit({
	"model": "your-model-name",
	"uncensored_rate": 0.15,
	"avg_compliance_score": 0.23,
	"total_prompts": 150,
	})
	```

	Or use the provided notebook: [establish_baseline.ipynb](https://github.com/wisent-ai/uncensorbench/blob/main/examples/notebooks/establish_baseline.ipynb)

	## Disclaimer

	This benchmark is for research purposes only. Results should be interpreted in the context of AI safety research.