Spaces:
Running
Running
Update src/about.py
Browse files- src/about.py +4 -4
src/about.py
CHANGED
|
@@ -34,7 +34,7 @@ Open Persian LLM Alignment Leaderboard
|
|
| 34 |
LLM_BENCHMARKS_TEXT = f"""
|
| 35 |
## Open Persian LLM Alignment Leaderboard
|
| 36 |
|
| 37 |
-
Developed by MCILAB in collaboration with the Machine Learning Laboratory at Sharif University of Technology, this benchmark presents a comprehensive evaluation framework for assessing the alignment of Persian Large Language Models (LLMs) with critical ethical dimensions, including safety, fairness, and social norms.
|
| 38 |
Addressing the gaps in existing LLM evaluation frameworks, this benchmark is specifically tailored to Persian linguistic and cultural contexts.
|
| 39 |
### It combines three types of Persian-language benchmarks:
|
| 40 |
1. Translated datasets (adapted from established English benchmarks)
|
|
@@ -43,17 +43,17 @@ Addressing the gaps in existing LLM evaluation frameworks, this benchmark is spe
|
|
| 43 |
|
| 44 |
### Key Datasets in the Benchmark
|
| 45 |
The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
|
| 46 |
-
Translated Datasets
|
| 47 |
• Anthropic-fa
|
| 48 |
• AdvBench-fa
|
| 49 |
• HarmBench-fa
|
| 50 |
• DecodingTrust-fa
|
| 51 |
-
|
| 52 |
• ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
|
| 53 |
• SafeBench-fa: Assesses safety in generated outputs.
|
| 54 |
• FairBench-fa: Measures bias mitigation in Persian LLMs.
|
| 55 |
• SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
|
| 56 |
-
|
| 57 |
• GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
|
| 58 |
|
| 59 |
### A Unified Framework for Persian LLM Evaluation
|
|
|
|
| 34 |
LLM_BENCHMARKS_TEXT = f"""
|
| 35 |
## Open Persian LLM Alignment Leaderboard
|
| 36 |
|
| 37 |
+
Developed by **MCILAB** in collaboration with the Machine Learning Laboratory at **Sharif University of Technology** , this benchmark presents a comprehensive evaluation framework for assessing the alignment of Persian Large Language Models (LLMs) with critical ethical dimensions, including safety, fairness, and social norms.
|
| 38 |
Addressing the gaps in existing LLM evaluation frameworks, this benchmark is specifically tailored to Persian linguistic and cultural contexts.
|
| 39 |
### It combines three types of Persian-language benchmarks:
|
| 40 |
1. Translated datasets (adapted from established English benchmarks)
|
|
|
|
| 43 |
|
| 44 |
### Key Datasets in the Benchmark
|
| 45 |
The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
|
| 46 |
+
**Translated Datasets**
|
| 47 |
• Anthropic-fa
|
| 48 |
• AdvBench-fa
|
| 49 |
• HarmBench-fa
|
| 50 |
• DecodingTrust-fa
|
| 51 |
+
**Newly Developed Persian Datasets**
|
| 52 |
• ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
|
| 53 |
• SafeBench-fa: Assesses safety in generated outputs.
|
| 54 |
• FairBench-fa: Measures bias mitigation in Persian LLMs.
|
| 55 |
• SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
|
| 56 |
+
**Naturally Collected Persian Dataset**
|
| 57 |
• GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
|
| 58 |
|
| 59 |
### A Unified Framework for Persian LLM Evaluation
|