DilatoMX
/

G11n_GenAI_Assesment_Model

Eval Results (legacy)

Model card Files Files and versions

G11n_GenAI_Assesment_Model / README.md

Edgar2699's picture

Update README.md

ceb2d53 verified 4 months ago

|

history blame contribute delete

2.02 kB

	---
	model-index:
	- name: GenAI G11n Assessment
	results:
	- task:
	type: text-classification
	name: G11n Evaluation
	dataset:
	name: G11nGenAIAssessmentModel
	type: custom
	metrics:
	- name: pass@1
	type: pass@1
	value: 0.82
	verified: false
	---

	# GenAI G11n Model Assessment

	## Overview
	This repository contains the manual evaluation framework for assessing multilingual and culturally adaptive capabilities in GenAI models. The current implementation focuses on evaluating AI systems using success criteria across instruction-following, translation fidelity, linguistic accuracy, and multimodal consistency.

	## Evaluation Scope
	The model is evaluated across the following main categories:
	- Language & Grammar
	- Cultural Adaptation
	- Instruction & Response Coherence
	- Multimodal Consistency

	Each category includes granular subcategories. See the latest version of `Model Template.xlsx` for a detailed breakdown.

	## Documents

	\| File \| Description \|
	\|------\|-------------\|
	\| `Model Template` \| Lists all evaluation criteria and subcategories \|
	\| `Locale` \| Includes all the related documentation to that specific locale \|
	\| [Prompts Datasets](https://huggingface.co/datasets/DilatoMX/G11n_GenAI_Prompt_Datasets) \| Lists all localized prompts to be used during assessment \|
	\| `Evaluation Results` \| Includes the results per model applied to each of the evaluated GenAIs \|

	## Models Evaluated
	- ChatGPT (4o)
	- Gemini (2.0 Flash)
	- Copilot (4o)
	- DeepSeek (V3)

	## Evaluation Process
	Each prompt is evaluated by at least 3 reviewers. Final agreement is consolidated across reviewers using a shared sheet. See structure in test results CSV and steps in eval instructive.

	## Licensing
	This evaluation framework is released under the MIT License. You are free to use, adapt, and extend it with attribution.

	## Maintainers
	- Andres Castillo – G11n QA
	- Edgar Castillo – G11n QA
	- Patricia Oceguera – Linguistic Advisor
	- Marcela Salgado – Review Support