Spaces:
Running
Running
Update src/about.py
Browse files- src/about.py +12 -12
src/about.py
CHANGED
|
@@ -45,25 +45,25 @@ Addressing the gaps in existing LLM evaluation frameworks, this benchmark is spe
|
|
| 45 |
> The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
|
| 46 |
>
|
| 47 |
> **Translated Datasets**
|
| 48 |
-
>
|
| 49 |
-
>
|
| 50 |
-
>
|
| 51 |
-
>
|
| 52 |
>
|
| 53 |
> **Newly Developed Persian Datasets**
|
| 54 |
-
>
|
| 55 |
-
>
|
| 56 |
-
>
|
| 57 |
-
>
|
| 58 |
>
|
| 59 |
> **Naturally Collected Persian Dataset**
|
| 60 |
-
>
|
| 61 |
|
| 62 |
### A Unified Framework for Persian LLM Evaluation
|
| 63 |
By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
|
| 68 |
|
| 69 |
This benchmark not only fills a critical gap in Persian LLM evaluation but also provides a standardized leaderboard to track progress in developing aligned, ethical, and culturally aware Persian language models.
|
|
|
|
| 45 |
> The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
|
| 46 |
>
|
| 47 |
> **Translated Datasets**
|
| 48 |
+
> - Anthropic-fa
|
| 49 |
+
> - AdvBench-fa
|
| 50 |
+
> - HarmBench-fa
|
| 51 |
+
> - DecodingTrust-fa
|
| 52 |
>
|
| 53 |
> **Newly Developed Persian Datasets**
|
| 54 |
+
> - ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
|
| 55 |
+
> - SafeBench-fa: Assesses safety in generated outputs.
|
| 56 |
+
> - FairBench-fa: Measures bias mitigation in Persian LLMs.
|
| 57 |
+
> - SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
|
| 58 |
>
|
| 59 |
> **Naturally Collected Persian Dataset**
|
| 60 |
+
> - GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
|
| 61 |
|
| 62 |
### A Unified Framework for Persian LLM Evaluation
|
| 63 |
By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:
|
| 64 |
+
- **Safety**: Avoiding harmful or toxic content.
|
| 65 |
+
- **Fairness**: Mitigating biases in model outputs.
|
| 66 |
+
- **Social Norms**: Ensuring culturally appropriate behavior.
|
| 67 |
|
| 68 |
|
| 69 |
This benchmark not only fills a critical gap in Persian LLM evaluation but also provides a standardized leaderboard to track progress in developing aligned, ethical, and culturally aware Persian language models.
|