📣 G11N GenAI Evaluation Model – 2026 Release Strategy Update

by AndresCas - opened 2 days ago

Dilato Infotech Limited org 2 days ago

🚀 G11N GenAI Evaluation Model – 2026 Release Strategy Update

As our project continues to grow, we have updated the release strategy for the G11N GenAI Evaluation Model to support a more scalable, sustainable, and quality-focused evaluation process.

This new roadmap is the result of our latest retrospective, where we identified that the biggest challenge is no longer creating new benchmark prompts, but scaling localization, evaluation, and analysis while maintaining consistent quality.

Why a New Release Strategy?

As we expand the project to additional languages and continuously improve the benchmark, we need a release cadence that balances:

📈 Continuous dataset growth
🌍 Multilingual expansion
📊 Reliable benchmark comparisons
🤝 Sustainable community contributions
✅ High-quality evaluation results

Instead of releasing everything at once, we are now separating evaluation releases from dataset expansion.

Release Model

⭐ Major Releases

Major releases are dedicated to official benchmark evaluations and include:

Evaluation results
Stable Benchmark updates
Stable Dataset releases for new languages
Benchmark comparison across supported GenAI models

These releases become the official reference point for model quality.

🧪 Minor Releases

Minor releases focus on continuous benchmark evolution by adding:

New experimental prompts
New categories
Prompt improvements
Candidate prompts for future Stable Benchmark versions

Minor releases do not include evaluation results.

2026 Release Schedule

Month	Release	Contents
June	⭐ Major	EN-US & ES-MX Stable Dataset + Evaluation Results
July	🧪 Minor	EN-US, ES-MX & JP Prompt Dataset (Beta)
August	🧪 Minor	EN-US, ES-MX & JP Prompt Dataset (Beta)
September	⭐ Major	EN-US, ES-MX & JP Stable Dataset Release + Evaluation Results
October	🧪 Minor	EN-US, ES-MX & FR Prompt Dataset (Beta)
November	⭐ Major	EN-US, ES-MX & FR Stable Dataset Release + Evaluation Results
December	🧪 Minor	EN-US & ES-MX Dataset + Yearly Summary & Model Updates

Language Expansion

Our roadmap gradually introduces new languages while maintaining evaluation quality.

Current Stable Datasets:

🇺🇸 EN-US
🇲🇽 ES-MX

Upcoming releases:

🇯🇵 JP-JA
🇫🇷 FR-FR

Each language follows the same lifecycle:

Prompt Dataset (Beta) → Stable Dataset → Official Evaluation

This phased approach allows us to validate localization quality before incorporating new languages into official benchmark evaluations.

Stable vs Experimental Benchmark

Stable Benchmark

Fixed evaluation dataset
Used for benchmark scoring
Supports long-term model comparison
Updated during Major Releases

Experimental Benchmark

Continuously growing dataset
New prompts and categories
Future Stable Benchmark candidates
Updated during Minor Releases

This separation allows innovation without affecting benchmark consistency.

📢 Call for Volunteers

As the project expands, we're looking for community contributors!

How can I add my contribution for review?
1.- Create a Pull Request 😉

2.- Upload you contribution, can be a folder with the next format [Username][Language][Contribution Type] e.g [Andalum][es-ES][Prompt Localization].

3.- Select the Collaboration community folder!

4.- Wait for you contribution to be reviewed and then get your reward 🏆 😉

Whether you're a localization expert, QA engineer, AI enthusiast, or native speaker, your contribution can help improve multilingual GenAI evaluation for everyone.

🤝 Get Involved

You can join the project by scanning the QR code included in the roadmap or contacting us directly.

📧 gloval-ai@dilatoinfotech.com

🤗 Project Repository:

https://huggingface.co/DilatoMX/G11n_GenAI_Assesment_Model

Looking Ahead

Our long-term vision is to build a community-driven, transparent, and multilingual evaluation framework for Generative AI.

By separating stable benchmark evaluations from experimental dataset growth, we can continue expanding the project while preserving consistency, reproducibility, and evaluation quality.

Thank you to everyone who has contributed and supported the project so far. More languages, more benchmarks, and more evaluations are coming soon! 🚀

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment