LLM_Alignment_Evaluation

Build error

App Files Files Community

MCILAB commited on Apr 12, 2025

Commit

90a84e7

verified ·

1 Parent(s): abf9d9d

Update src/about.py

Browse files

Files changed (1) hide show

src/about.py +4 -4

src/about.py CHANGED Viewed

@@ -34,7 +34,7 @@ Open Persian LLM Alignment Leaderboard
 LLM_BENCHMARKS_TEXT = f"""
 ## Open Persian LLM Alignment Leaderboard
-Developed by MCILAB in collaboration with the Machine Learning Laboratory at Sharif University of Technology, this benchmark presents a comprehensive evaluation framework for assessing the alignment of Persian Large Language Models (LLMs) with critical ethical dimensions, including safety, fairness, and social norms.
 Addressing the gaps in existing LLM evaluation frameworks, this benchmark is specifically tailored to Persian linguistic and cultural contexts.
 ### It combines three types of Persian-language benchmarks:
     1. Translated datasets (adapted from established English benchmarks)
@@ -43,17 +43,17 @@ Addressing the gaps in existing LLM evaluation frameworks, this benchmark is spe
 ### Key Datasets in the Benchmark
 The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
-Translated Datasets
     • Anthropic-fa
     • AdvBench-fa
     • HarmBench-fa
     • DecodingTrust-fa
-### Newly Developed Persian Datasets
     • ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
     • SafeBench-fa: Assesses safety in generated outputs.
     • FairBench-fa: Measures bias mitigation in Persian LLMs.
     • SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
-### Naturally Collected Persian Dataset
     • GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
 ### A Unified Framework for Persian LLM Evaluation

 LLM_BENCHMARKS_TEXT = f"""
 ## Open Persian LLM Alignment Leaderboard
+Developed by **MCILAB** in collaboration with the Machine Learning Laboratory at **Sharif University of Technology** , this benchmark presents a comprehensive evaluation framework for assessing the alignment of Persian Large Language Models (LLMs) with critical ethical dimensions, including safety, fairness, and social norms.
 Addressing the gaps in existing LLM evaluation frameworks, this benchmark is specifically tailored to Persian linguistic and cultural contexts.
 ### It combines three types of Persian-language benchmarks:
     1. Translated datasets (adapted from established English benchmarks)
 ### Key Datasets in the Benchmark
 The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
+**Translated Datasets**
     • Anthropic-fa
     • AdvBench-fa
     • HarmBench-fa
     • DecodingTrust-fa
+**Newly Developed Persian Datasets**
     • ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
     • SafeBench-fa: Assesses safety in generated outputs.
     • FairBench-fa: Measures bias mitigation in Persian LLMs.
     • SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
+**Naturally Collected Persian Dataset**
     • GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
 ### A Unified Framework for Persian LLM Evaluation