πŸ“£ G11N GenAI Evaluation Model – 2026 Release Strategy Update

#6
by AndresCas - opened
Dilato Infotech Limited org

πŸš€ G11N GenAI Evaluation Model – 2026 Release Strategy Update

As our project continues to grow, we have updated the release strategy for the G11N GenAI Evaluation Model to support a more scalable, sustainable, and quality-focused evaluation process.

This new roadmap is the result of our latest retrospective, where we identified that the biggest challenge is no longer creating new benchmark prompts, but scaling localization, evaluation, and analysis while maintaining consistent quality.

Release Plan


Why a New Release Strategy?

As we expand the project to additional languages and continuously improve the benchmark, we need a release cadence that balances:

  • πŸ“ˆ Continuous dataset growth
  • 🌍 Multilingual expansion
  • πŸ“Š Reliable benchmark comparisons
  • 🀝 Sustainable community contributions
  • βœ… High-quality evaluation results

Instead of releasing everything at once, we are now separating evaluation releases from dataset expansion.


Release Model

⭐ Major Releases

Major releases are dedicated to official benchmark evaluations and include:

  • Evaluation results
  • Stable Benchmark updates
  • Stable Dataset releases for new languages
  • Benchmark comparison across supported GenAI models

These releases become the official reference point for model quality.


πŸ§ͺ Minor Releases

Minor releases focus on continuous benchmark evolution by adding:

  • New experimental prompts
  • New categories
  • Prompt improvements
  • Candidate prompts for future Stable Benchmark versions

Minor releases do not include evaluation results.


2026 Release Schedule

Month Release Contents
June ⭐ Major EN-US & ES-MX Stable Dataset + Evaluation Results
July πŸ§ͺ Minor EN-US, ES-MX & JP Prompt Dataset (Beta)
August πŸ§ͺ Minor EN-US, ES-MX & JP Prompt Dataset (Beta)
September ⭐ Major EN-US, ES-MX & JP Stable Dataset Release + Evaluation Results
October πŸ§ͺ Minor EN-US, ES-MX & FR Prompt Dataset (Beta)
November ⭐ Major EN-US, ES-MX & FR Stable Dataset Release + Evaluation Results
December πŸ§ͺ Minor EN-US & ES-MX Dataset + Yearly Summary & Model Updates

Language Expansion

Our roadmap gradually introduces new languages while maintaining evaluation quality.

Current Stable Datasets:

  • πŸ‡ΊπŸ‡Έ EN-US
  • πŸ‡²πŸ‡½ ES-MX

Upcoming releases:

  • πŸ‡―πŸ‡΅ JP-JA
  • πŸ‡«πŸ‡· FR-FR

Each language follows the same lifecycle:

Prompt Dataset (Beta) β†’ Stable Dataset β†’ Official Evaluation

This phased approach allows us to validate localization quality before incorporating new languages into official benchmark evaluations.


Stable vs Experimental Benchmark

Stable Benchmark

  • Fixed evaluation dataset
  • Used for benchmark scoring
  • Supports long-term model comparison
  • Updated during Major Releases

Experimental Benchmark

  • Continuously growing dataset
  • New prompts and categories
  • Future Stable Benchmark candidates
  • Updated during Minor Releases

This separation allows innovation without affecting benchmark consistency.


πŸ“’ Call for Volunteers

As the project expands, we're looking for community contributors!

1

How can I add my contribution for review?
1.- Create a Pull Request πŸ˜‰
2
2.- Upload you contribution, can be a folder with the next format [Username][Language][Contribution Type] e.g [Andalum][es-ES][Prompt Localization].
3
3.- Select the Collaboration community folder!
4
4.- Wait for you contribution to be reviewed and then get your reward πŸ† πŸ˜‰

Whether you're a localization expert, QA engineer, AI enthusiast, or native speaker, your contribution can help improve multilingual GenAI evaluation for everyone.

5


🀝 Get Involved

You can join the project by scanning the QR code included in the roadmap or contacting us directly.

πŸ“§ gloval-ai@dilatoinfotech.com

πŸ€— Project Repository:

https://huggingface.co/DilatoMX/G11n_GenAI_Assesment_Model


Looking Ahead

Our long-term vision is to build a community-driven, transparent, and multilingual evaluation framework for Generative AI.

By separating stable benchmark evaluations from experimental dataset growth, we can continue expanding the project while preserving consistency, reproducibility, and evaluation quality.

Thank you to everyone who has contributed and supported the project so far. More languages, more benchmarks, and more evaluations are coming soon! πŸš€

Sign up or log in to comment