Jay
commited on
Commit
·
58e363c
1
Parent(s):
16a4f17
update text
Browse files- assets/text.py +3 -3
assets/text.py
CHANGED
|
@@ -7,25 +7,25 @@ On this leaderboard, we share the evaluation results of LLMs obtained by develop
|
|
| 7 |
# Dataset
|
| 8 |
<span style="font-size:16px; font-family: 'Times New Roman', serif">
|
| 9 |
To evaluate the safety risk of LLMs of large language models, we present ChineseSafe, a Chinese safety benchmark to facilitate research
|
| 10 |
-
on the content safety of
|
| 11 |
To align with the regulations for Chinese Internet content moderation, our ChineseSafe contains 205,034 examples
|
| 12 |
across 4 classes and 10 sub-classes of safety issues. For Chinese contexts, we add several special types of illegal content: political sensitivity, pornography,
|
| 13 |
and variant/homophonic words. In particular, the benchmark is constructed as a balanced dataset, containing safe and unsafe data collected from internet resources and public datasets [1,2,3].
|
| 14 |
We hope the evaluation can provides a guideline for developers and researchers to facilitate the safety of LLMs. <br>
|
| 15 |
|
| 16 |
The leadboard is under construction and maintained by <a href="https://hongxin001.github.io/" target="_blank">Hongxin Wei's</a> research group at SUSTech.
|
| 17 |
-
We will release the technical report in the near future.
|
| 18 |
Comments, issues, contributions, and collaborations are all welcomed!
|
| 19 |
Email: weihx@sustech.edu.cn
|
| 20 |
</span>
|
| 21 |
""" # noqa
|
|
|
|
| 22 |
|
| 23 |
METRICS_TEXT = """
|
| 24 |
# Metrics
|
| 25 |
<span style="font-size:16px; font-family: 'Times New Roman', serif">
|
| 26 |
We report the results with five metrics: overall accuracy, precision/recall for safe/unsafe content.
|
| 27 |
In particular, the results are shown as <b>metric/std</b> format in the table,
|
| 28 |
-
where <b>std</b> indicates the standard deviation of the results
|
| 29 |
</span>
|
| 30 |
""" # noqa
|
| 31 |
|
|
|
|
| 7 |
# Dataset
|
| 8 |
<span style="font-size:16px; font-family: 'Times New Roman', serif">
|
| 9 |
To evaluate the safety risk of LLMs of large language models, we present ChineseSafe, a Chinese safety benchmark to facilitate research
|
| 10 |
+
on the content safety of LLMs for Chinese (Mandarin).
|
| 11 |
To align with the regulations for Chinese Internet content moderation, our ChineseSafe contains 205,034 examples
|
| 12 |
across 4 classes and 10 sub-classes of safety issues. For Chinese contexts, we add several special types of illegal content: political sensitivity, pornography,
|
| 13 |
and variant/homophonic words. In particular, the benchmark is constructed as a balanced dataset, containing safe and unsafe data collected from internet resources and public datasets [1,2,3].
|
| 14 |
We hope the evaluation can provides a guideline for developers and researchers to facilitate the safety of LLMs. <br>
|
| 15 |
|
| 16 |
The leadboard is under construction and maintained by <a href="https://hongxin001.github.io/" target="_blank">Hongxin Wei's</a> research group at SUSTech.
|
|
|
|
| 17 |
Comments, issues, contributions, and collaborations are all welcomed!
|
| 18 |
Email: weihx@sustech.edu.cn
|
| 19 |
</span>
|
| 20 |
""" # noqa
|
| 21 |
+
# We will release the technical report in the near future.
|
| 22 |
|
| 23 |
METRICS_TEXT = """
|
| 24 |
# Metrics
|
| 25 |
<span style="font-size:16px; font-family: 'Times New Roman', serif">
|
| 26 |
We report the results with five metrics: overall accuracy, precision/recall for safe/unsafe content.
|
| 27 |
In particular, the results are shown as <b>metric/std</b> format in the table,
|
| 28 |
+
where <b>std</b> indicates the standard deviation of the results with various random seeds.
|
| 29 |
</span>
|
| 30 |
""" # noqa
|
| 31 |
|