Nvidia Game Ready Model Score (GRM) is an aggregated quality metric designed to assess LLM capabilites in gaming use cases.

General state-of-the-art language models are optimized for broad benchmarks such as math, code, and general knowledge. That does not reliably translate to in-game performance, and it does not reliably predict NPC quality, gameplay actions, or immersion.

With game model evaluation, game developers can accelerate AI integration pipelines by reducing time spent on model evaluation and narrowing model choice earlier. The overall score is the average of Roleplay, Actions, and General, while benchmarks inside each category are combined with weighted averaging using core weights of 1.0 and supplementary weights of 0.5.

GRM Score = (Roleplay + Actions + General) / 3

Category Score = sum(score x weight) / sum(weight)