Jay
commited on
Commit
Β·
c5184ef
1
Parent(s):
c66aadd
fix: update text
Browse files- app.py +1 -1
- assets/text.py +19 -12
app.py
CHANGED
|
@@ -194,7 +194,7 @@ with gr.Blocks() as demo:
|
|
| 194 |
elem_id="leaderboard-table",
|
| 195 |
)
|
| 196 |
|
| 197 |
-
with gr.TabItem("π
|
| 198 |
dataframe_all_per = gr.components.Dataframe(
|
| 199 |
elem_id="leaderboard-table",
|
| 200 |
)
|
|
|
|
| 194 |
elem_id="leaderboard-table",
|
| 195 |
)
|
| 196 |
|
| 197 |
+
with gr.TabItem("π
Perplexity", elem_id="od-benchmark-tab-table", id=5):
|
| 198 |
dataframe_all_per = gr.components.Dataframe(
|
| 199 |
elem_id="leaderboard-table",
|
| 200 |
)
|
assets/text.py
CHANGED
|
@@ -6,10 +6,12 @@ On this leaderboard, we share the evaluation results of LLMs obtained by develop
|
|
| 6 |
|
| 7 |
# Dataset
|
| 8 |
<span style="font-size:16px; font-family: 'Times New Roman', serif">
|
| 9 |
-
To evaluate the
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
|
|
|
|
|
|
| 13 |
|
| 14 |
The leadboard is under construction and maintained by <a href="https://hongxin001.github.io/" target="_blank">Hongxin Wei's</a> research group at SUSTech.
|
| 15 |
We will release the technical report in the near future.
|
|
@@ -30,7 +32,7 @@ where <b>std</b> indicates the standard deviation of the results obtained from d
|
|
| 30 |
EVALUTION_TEXT= """
|
| 31 |
# Evaluation
|
| 32 |
<span style="font-size:16px; font-family: 'Times New Roman', serif">
|
| 33 |
-
We evaluate the models using two methods: multiple choice
|
| 34 |
For perplexity, we select the label which is the lowest perplexity as the predicted results.
|
| 35 |
For generation, we use the content generated by the model to make prediction.
|
| 36 |
The following are the results of the evaluation. πππ
|
|
@@ -48,16 +50,21 @@ REFERENCE_TEXT = """
|
|
| 48 |
|
| 49 |
"""
|
| 50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
ACKNOWLEDGEMENTS_TEXT = """
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
a joint lab constructed by Deepexi and Department of Statistics and Data Science at SUSTech.
|
| 56 |
-
We gratefully acknowledge the contributions of Prof. Bingyi Jing, Prof. Lili Yang,
|
| 57 |
-
and Asst. Prof.Guanhua Chen for their support throughout this project.
|
| 58 |
"""
|
| 59 |
|
| 60 |
-
|
| 61 |
CONTACT_TEXT = """
|
| 62 |
# Contact
|
| 63 |
<span style="font-size:16px; font-family: 'Times New Roman', serif">
|
|
|
|
| 6 |
|
| 7 |
# Dataset
|
| 8 |
<span style="font-size:16px; font-family: 'Times New Roman', serif">
|
| 9 |
+
To evaluate the safety risk of LLMs of large language models, we present ChineseSafe, a Chinese safety benchmark to facilitate research
|
| 10 |
+
on the content safety of large language models for Chinese (Mandarin).
|
| 11 |
+
To align with the regulations for Chinese Internet content moderation, our ChineseSafe contains 205,034 examples
|
| 12 |
+
across 4 classes and 10 sub-classes of safety issues. For Chinese contexts, we add several special types of illegal content: political sensitivity, pornography,
|
| 13 |
+
and variant/homophonic words. In particular, the benchmark is constructed as a balanced dataset, containing safe and unsafe data collected from internet resources and public datasets [1,2,3].
|
| 14 |
+
We hope the evaluation can provides a guideline for developers and researchers to facilitate the safety of LLMs. <br>
|
| 15 |
|
| 16 |
The leadboard is under construction and maintained by <a href="https://hongxin001.github.io/" target="_blank">Hongxin Wei's</a> research group at SUSTech.
|
| 17 |
We will release the technical report in the near future.
|
|
|
|
| 32 |
EVALUTION_TEXT= """
|
| 33 |
# Evaluation
|
| 34 |
<span style="font-size:16px; font-family: 'Times New Roman', serif">
|
| 35 |
+
We evaluate the models using two methods: perplexity(multiple choice) and generation.
|
| 36 |
For perplexity, we select the label which is the lowest perplexity as the predicted results.
|
| 37 |
For generation, we use the content generated by the model to make prediction.
|
| 38 |
The following are the results of the evaluation. πππ
|
|
|
|
| 50 |
|
| 51 |
"""
|
| 52 |
|
| 53 |
+
# ACKNOWLEDGEMENTS_TEXT = """
|
| 54 |
+
# # Acknowledgements
|
| 55 |
+
# <span style="font-size:16px; font-family: 'Times New Roman', serif">
|
| 56 |
+
# This research is supported by "Data+AI" Data Intelligent Laboratory,
|
| 57 |
+
# a joint lab constructed by Deepexi and Department of Statistics and Data Science at SUSTech.
|
| 58 |
+
# We gratefully acknowledge the contributions of Prof. Bingyi Jing, Prof. Lili Yang,
|
| 59 |
+
# and Asst. Prof.Guanhua Chen for their support throughout this project.
|
| 60 |
+
# """
|
| 61 |
+
|
| 62 |
ACKNOWLEDGEMENTS_TEXT = """
|
| 63 |
+
This research is supported by the Shenzhen Fundamental Research Program (Grant No.
|
| 64 |
+
JCYJ20230807091809020). We gratefully acknowledge the support of "Data+AI" Data Intelligent Laboratory, a joint lab constructed by Deepexi and the Department of Statistics and Data Science
|
| 65 |
+
at Southern University of Science and Technology.
|
|
|
|
|
|
|
|
|
|
| 66 |
"""
|
| 67 |
|
|
|
|
| 68 |
CONTACT_TEXT = """
|
| 69 |
# Contact
|
| 70 |
<span style="font-size:16px; font-family: 'Times New Roman', serif">
|