MCILAB commited on
Commit
90a84e7
·
verified ·
1 Parent(s): abf9d9d

Update src/about.py

Browse files
Files changed (1) hide show
  1. src/about.py +4 -4
src/about.py CHANGED
@@ -34,7 +34,7 @@ Open Persian LLM Alignment Leaderboard
34
  LLM_BENCHMARKS_TEXT = f"""
35
  ## Open Persian LLM Alignment Leaderboard
36
 
37
- Developed by MCILAB in collaboration with the Machine Learning Laboratory at Sharif University of Technology, this benchmark presents a comprehensive evaluation framework for assessing the alignment of Persian Large Language Models (LLMs) with critical ethical dimensions, including safety, fairness, and social norms.
38
  Addressing the gaps in existing LLM evaluation frameworks, this benchmark is specifically tailored to Persian linguistic and cultural contexts.
39
  ### It combines three types of Persian-language benchmarks:
40
  1. Translated datasets (adapted from established English benchmarks)
@@ -43,17 +43,17 @@ Addressing the gaps in existing LLM evaluation frameworks, this benchmark is spe
43
 
44
  ### Key Datasets in the Benchmark
45
  The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
46
- Translated Datasets
47
  • Anthropic-fa
48
  • AdvBench-fa
49
  • HarmBench-fa
50
  • DecodingTrust-fa
51
- ### Newly Developed Persian Datasets
52
  • ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
53
  • SafeBench-fa: Assesses safety in generated outputs.
54
  • FairBench-fa: Measures bias mitigation in Persian LLMs.
55
  • SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
56
- ### Naturally Collected Persian Dataset
57
  • GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
58
 
59
  ### A Unified Framework for Persian LLM Evaluation
 
34
  LLM_BENCHMARKS_TEXT = f"""
35
  ## Open Persian LLM Alignment Leaderboard
36
 
37
+ Developed by **MCILAB** in collaboration with the Machine Learning Laboratory at **Sharif University of Technology** , this benchmark presents a comprehensive evaluation framework for assessing the alignment of Persian Large Language Models (LLMs) with critical ethical dimensions, including safety, fairness, and social norms.
38
  Addressing the gaps in existing LLM evaluation frameworks, this benchmark is specifically tailored to Persian linguistic and cultural contexts.
39
  ### It combines three types of Persian-language benchmarks:
40
  1. Translated datasets (adapted from established English benchmarks)
 
43
 
44
  ### Key Datasets in the Benchmark
45
  The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
46
+ **Translated Datasets**
47
  • Anthropic-fa
48
  • AdvBench-fa
49
  • HarmBench-fa
50
  • DecodingTrust-fa
51
+ **Newly Developed Persian Datasets**
52
  • ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
53
  • SafeBench-fa: Assesses safety in generated outputs.
54
  • FairBench-fa: Measures bias mitigation in Persian LLMs.
55
  • SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
56
+ **Naturally Collected Persian Dataset**
57
  • GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
58
 
59
  ### A Unified Framework for Persian LLM Evaluation