π£ G11N GenAI Evaluation Model β 2026 Release Strategy Update
π G11N GenAI Evaluation Model β 2026 Release Strategy Update
As our project continues to grow, we have updated the release strategy for the G11N GenAI Evaluation Model to support a more scalable, sustainable, and quality-focused evaluation process.
This new roadmap is the result of our latest retrospective, where we identified that the biggest challenge is no longer creating new benchmark prompts, but scaling localization, evaluation, and analysis while maintaining consistent quality.
Why a New Release Strategy?
As we expand the project to additional languages and continuously improve the benchmark, we need a release cadence that balances:
- π Continuous dataset growth
- π Multilingual expansion
- π Reliable benchmark comparisons
- π€ Sustainable community contributions
- β High-quality evaluation results
Instead of releasing everything at once, we are now separating evaluation releases from dataset expansion.
Release Model
β Major Releases
Major releases are dedicated to official benchmark evaluations and include:
- Evaluation results
- Stable Benchmark updates
- Stable Dataset releases for new languages
- Benchmark comparison across supported GenAI models
These releases become the official reference point for model quality.
π§ͺ Minor Releases
Minor releases focus on continuous benchmark evolution by adding:
- New experimental prompts
- New categories
- Prompt improvements
- Candidate prompts for future Stable Benchmark versions
Minor releases do not include evaluation results.
2026 Release Schedule
| Month | Release | Contents |
|---|---|---|
| June | β Major | EN-US & ES-MX Stable Dataset + Evaluation Results |
| July | π§ͺ Minor | EN-US, ES-MX & JP Prompt Dataset (Beta) |
| August | π§ͺ Minor | EN-US, ES-MX & JP Prompt Dataset (Beta) |
| September | β Major | EN-US, ES-MX & JP Stable Dataset Release + Evaluation Results |
| October | π§ͺ Minor | EN-US, ES-MX & FR Prompt Dataset (Beta) |
| November | β Major | EN-US, ES-MX & FR Stable Dataset Release + Evaluation Results |
| December | π§ͺ Minor | EN-US & ES-MX Dataset + Yearly Summary & Model Updates |
Language Expansion
Our roadmap gradually introduces new languages while maintaining evaluation quality.
Current Stable Datasets:
- πΊπΈ EN-US
- π²π½ ES-MX
Upcoming releases:
- π―π΅ JP-JA
- π«π· FR-FR
Each language follows the same lifecycle:
Prompt Dataset (Beta) β Stable Dataset β Official Evaluation
This phased approach allows us to validate localization quality before incorporating new languages into official benchmark evaluations.
Stable vs Experimental Benchmark
Stable Benchmark
- Fixed evaluation dataset
- Used for benchmark scoring
- Supports long-term model comparison
- Updated during Major Releases
Experimental Benchmark
- Continuously growing dataset
- New prompts and categories
- Future Stable Benchmark candidates
- Updated during Minor Releases
This separation allows innovation without affecting benchmark consistency.
π’ Call for Volunteers
As the project expands, we're looking for community contributors!
How can I add my contribution for review?
1.- Create a Pull Request π
2.- Upload you contribution, can be a folder with the next format [Username][Language][Contribution Type] e.g [Andalum][es-ES][Prompt Localization].
3.- Select the Collaboration community folder!
4.- Wait for you contribution to be reviewed and then get your reward π π
Whether you're a localization expert, QA engineer, AI enthusiast, or native speaker, your contribution can help improve multilingual GenAI evaluation for everyone.
π€ Get Involved
You can join the project by scanning the QR code included in the roadmap or contacting us directly.
π§ gloval-ai@dilatoinfotech.com
π€ Project Repository:
https://huggingface.co/DilatoMX/G11n_GenAI_Assesment_Model
Looking Ahead
Our long-term vision is to build a community-driven, transparent, and multilingual evaluation framework for Generative AI.
By separating stable benchmark evaluations from experimental dataset growth, we can continue expanding the project while preserving consistency, reproducibility, and evaluation quality.
Thank you to everyone who has contributed and supported the project so far. More languages, more benchmarks, and more evaluations are coming soon! π


