============================================================ WORLD MODEL EVALUATION STATISTICS ============================================================ Model: xlangai/OpenCUA-7B + ageppert/world-model-7b-lora Validation examples: 4210 Total inference time: 28496.8s (474.9 min) Average time per example: 6.77s Prediction lengths (characters): min: 447 median: 795 mean: 878 max: 2964 Ground truth lengths (characters): min: 340 median: 796 mean: 802 max: 1367 Empty predictions: 0 (0.0%) Generation config: max_new_tokens: 512 temperature: 0.1 top_p: 0.9