S-Eval

Running

App Files Files Community

Yuanxh commited on Oct 10, 2025

Commit

74a692f

verified ·

1 Parent(s): b1a9635

Update constants.py

Browse files

Files changed (1) hide show

constants.py +2 -0

constants.py CHANGED Viewed

@@ -32,6 +32,8 @@ XLSX_DIR = "./file//results.xlsx"
 LEADERBOARD_INTRODUCTION = """# 🏆 S-Eval Leaderboard
     ## 🔔 Updates
     📣 [2025/03/30]: 🎉 Our [paper](https://dl.acm.org/doi/abs/10.1145/3728971) has been accepted by ISSTA 2025. To meet evaluation needs under different budgets, we partition the benchmark into four scales: [Small](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/small) (1,000 Base and 10,000 Attack in each language), [Medium](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/medium) (3,000 Base and 30,000 Attack in each language), [Large](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/large) (5,000 Base and 50,000 Attack in each language) and [Full](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/full) (10,000 Base and 100,000 Attack in each language), comprehensively considering the balance and harmfulness of data.
     📣 [2024/10/25]: We release all 20,000 base risk prompts and 200,000 corresponding attack prompts ([Version-0.1.2](https://github.com/IS2Lab/S-Eval)). We also update [🏆 LeaderBoard](https://huggingface.co/spaces/IS2Lab/S-Eval) with new evaluation results including GPT-4 and other models.

 LEADERBOARD_INTRODUCTION = """# 🏆 S-Eval Leaderboard
     ## 🔔 Updates
+    📣 [2025/10/09]: 🎉 We release [**Octopus**](https://github.com/Alibaba-AAIG/Octopus), an automated LLM safety evaluator, to meet the community’s need for accurate and reproducible safety assessment tools. You can download the model from [HuggingFace](https://huggingface.co/Alibaba-AAIG/Octopus-14B) or [ModelScope](https://modelscope.cn/models/Alibaba-AAIG/Octopus-14B/summary).
     📣 [2025/03/30]: 🎉 Our [paper](https://dl.acm.org/doi/abs/10.1145/3728971) has been accepted by ISSTA 2025. To meet evaluation needs under different budgets, we partition the benchmark into four scales: [Small](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/small) (1,000 Base and 10,000 Attack in each language), [Medium](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/medium) (3,000 Base and 30,000 Attack in each language), [Large](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/large) (5,000 Base and 50,000 Attack in each language) and [Full](https://github.com/IS2Lab/S-Eval/tree/main/s_eval/full) (10,000 Base and 100,000 Attack in each language), comprehensively considering the balance and harmfulness of data.
     📣 [2024/10/25]: We release all 20,000 base risk prompts and 200,000 corresponding attack prompts ([Version-0.1.2](https://github.com/IS2Lab/S-Eval)). We also update [🏆 LeaderBoard](https://huggingface.co/spaces/IS2Lab/S-Eval) with new evaluation results including GPT-4 and other models.