openai/gsm8k
Benchmark • Updated • 17.6k • 971k • 1.35k
GRPO experiment from TinkerRL-Bench world-class experiment suite.
[
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0
]
@misc{tinker-rl-bench-2026,
title={TinkerRL-Bench: A Unified Benchmark for RL Post-Training},
author={Arvind C R and Sandhya Jeyaraj and Madhu Kumara L and Mohammad Rafi and Dhruva N Murthy and Arumugam K},
year={2026},
url={https://github.com/arvindcr4/tinker-rl-lab}
}
Base model
meta-llama/Llama-3.1-8B