SORRY-Bench (2025/03) Collection In this iteration, we removed the category "Impersonation" due to its ambiguous definition, and the fa most models more or less fulfill such requests. • 3 items • Updated Feb 28, 2025
SORRY-Bench (2025/03) Collection In this iteration, we removed the category "Impersonation" due to its ambiguous definition, and the fa most models more or less fulfill such requests. • 3 items • Updated Feb 28, 2025
SORRY-Bench (2025/03) Collection In this iteration, we removed the category "Impersonation" due to its ambiguous definition, and the fa most models more or less fulfill such requests. • 3 items • Updated Feb 28, 2025
sorry-bench/ft-mistral-7b-instruct-v0.2-sorry-bench-202406 Text Generation • 7B • Updated Jul 2, 2024 • 3.32k • 9
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors Paper • 2406.14598 • Published Jun 20, 2024