|
|
--- |
|
|
model-index: |
|
|
- name: GenAI G11n Assessment |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: G11n Evaluation |
|
|
dataset: |
|
|
name: G11nGenAIAssessmentModel |
|
|
type: custom |
|
|
metrics: |
|
|
- name: pass@1 |
|
|
type: pass@1 |
|
|
value: 0.82 |
|
|
verified: false |
|
|
--- |
|
|
|
|
|
# GenAI G11n Model Assessment |
|
|
|
|
|
## Overview |
|
|
This repository contains the manual evaluation framework for assessing multilingual and culturally adaptive capabilities in GenAI models. The current implementation focuses on evaluating AI systems using success criteria across instruction-following, translation fidelity, linguistic accuracy, and multimodal consistency. |
|
|
|
|
|
## Evaluation Scope |
|
|
The model is evaluated across the following main categories: |
|
|
- Language & Grammar |
|
|
- Cultural Adaptation |
|
|
- Instruction & Response Coherence |
|
|
- Multimodal Consistency |
|
|
|
|
|
Each category includes granular subcategories. See the latest version of `Model Template.xlsx` for a detailed breakdown. |
|
|
|
|
|
## Documents |
|
|
|
|
|
| File | Description | |
|
|
|------|-------------| |
|
|
| `Model Template` | Lists all evaluation criteria and subcategories | |
|
|
| `Locale` | Includes all the related documentation to that specific locale | |
|
|
| [Prompts Datasets](https://huggingface.co/datasets/DilatoMX/G11n_GenAI_Prompt_Datasets) | Lists all localized prompts to be used during assessment | |
|
|
| `Evaluation Results` | Includes the results per model applied to each of the evaluated GenAIs | |
|
|
|
|
|
## Models Evaluated |
|
|
- ChatGPT (4o) |
|
|
- Gemini (2.0 Flash) |
|
|
- Copilot (4o) |
|
|
- DeepSeek (V3) |
|
|
|
|
|
## Evaluation Process |
|
|
Each prompt is evaluated by at least 3 reviewers. Final agreement is consolidated across reviewers using a shared sheet. See structure in test results CSV and steps in eval instructive. |
|
|
|
|
|
## Licensing |
|
|
This evaluation framework is released under the MIT License. You are free to use, adapt, and extend it with attribution. |
|
|
|
|
|
## Maintainers |
|
|
- Andres Castillo – G11n QA |
|
|
- Edgar Castillo – G11n QA |
|
|
- Patricia Oceguera – Linguistic Advisor |
|
|
- Marcela Salgado – Review Support |