CodeRM-Bilevel-GRPO-4B / generation_config.json

Commit History

Upload GRPO-trained 4B model (step 58, bilevel reward)
36a0915
verified

t2ance commited on