arvindcr4/tinker-rl-bench-ppo_gsm8k_Llama-3.1-8B-Instruct_s42 Text Generation • Updated 19 days ago • 16
arvindcr4/tinker-rl-bench-frontier_gsm8k_deepseek-v3.1 Reinforcement Learning • Updated 20 days ago