MCILAB commited on
Commit
6d3b53b
·
verified ·
1 Parent(s): 046d108

Update src/about.py

Browse files
Files changed (1) hide show
  1. src/about.py +12 -12
src/about.py CHANGED
@@ -45,25 +45,25 @@ Addressing the gaps in existing LLM evaluation frameworks, this benchmark is spe
45
  > The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
46
  >
47
  > **Translated Datasets**
48
- > Anthropic-fa
49
- > AdvBench-fa
50
- > HarmBench-fa
51
- > DecodingTrust-fa
52
  >
53
  > **Newly Developed Persian Datasets**
54
- > ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
55
- > SafeBench-fa: Assesses safety in generated outputs.
56
- > FairBench-fa: Measures bias mitigation in Persian LLMs.
57
- > SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
58
  >
59
  > **Naturally Collected Persian Dataset**
60
- > GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
61
 
62
  ### A Unified Framework for Persian LLM Evaluation
63
  By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:
64
- **Safety**: Avoiding harmful or toxic content.
65
- **Fairness**: Mitigating biases in model outputs.
66
- **Social Norms**: Ensuring culturally appropriate behavior.
67
 
68
 
69
  This benchmark not only fills a critical gap in Persian LLM evaluation but also provides a standardized leaderboard to track progress in developing aligned, ethical, and culturally aware Persian language models.
 
45
  > The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:
46
  >
47
  > **Translated Datasets**
48
+ > - Anthropic-fa
49
+ > - AdvBench-fa
50
+ > - HarmBench-fa
51
+ > - DecodingTrust-fa
52
  >
53
  > **Newly Developed Persian Datasets**
54
+ > - ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.
55
+ > - SafeBench-fa: Assesses safety in generated outputs.
56
+ > - FairBench-fa: Measures bias mitigation in Persian LLMs.
57
+ > - SocialBench-fa: Evaluates adherence to culturally accepted behaviors.
58
  >
59
  > **Naturally Collected Persian Dataset**
60
+ > - GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.
61
 
62
  ### A Unified Framework for Persian LLM Evaluation
63
  By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:
64
+ - **Safety**: Avoiding harmful or toxic content.
65
+ - **Fairness**: Mitigating biases in model outputs.
66
+ - **Social Norms**: Ensuring culturally appropriate behavior.
67
 
68
 
69
  This benchmark not only fills a critical gap in Persian LLM evaluation but also provides a standardized leaderboard to track progress in developing aligned, ethical, and culturally aware Persian language models.