G11n_GenAI_Assesment_Model / README.md

Edgar2699

Update README.md

ceb2d53 verified 4 months ago

preview code

raw

history blame contribute delete

2.02 kB

metadata

model-index:
  - name: GenAI G11n Assessment
    results:
      - task:
          type: text-classification
          name: G11n Evaluation
        dataset:
          name: G11nGenAIAssessmentModel
          type: custom
        metrics:
          - name: pass@1
            type: pass@1
            value: 0.82
            verified: false

GenAI G11n Model Assessment

Overview

This repository contains the manual evaluation framework for assessing multilingual and culturally adaptive capabilities in GenAI models. The current implementation focuses on evaluating AI systems using success criteria across instruction-following, translation fidelity, linguistic accuracy, and multimodal consistency.

Evaluation Scope

The model is evaluated across the following main categories:

Language & Grammar
Cultural Adaptation
Instruction & Response Coherence
Multimodal Consistency

Each category includes granular subcategories. See the latest version of Model Template.xlsx for a detailed breakdown.

Documents

File	Description
`Model Template`	Lists all evaluation criteria and subcategories
`Locale`	Includes all the related documentation to that specific locale
Prompts Datasets	Lists all localized prompts to be used during assessment
`Evaluation Results`	Includes the results per model applied to each of the evaluated GenAIs

Models Evaluated

ChatGPT (4o)
Gemini (2.0 Flash)
Copilot (4o)
DeepSeek (V3)

Evaluation Process

Each prompt is evaluated by at least 3 reviewers. Final agreement is consolidated across reviewers using a shared sheet. See structure in test results CSV and steps in eval instructive.

Licensing

This evaluation framework is released under the MIT License. You are free to use, adapt, and extend it with attribution.

Maintainers

Andres Castillo – G11n QA
Edgar Castillo – G11n QA
Patricia Oceguera – Linguistic Advisor
Marcela Salgado – Review Support