CodeRM-Bilevel-GRPO-4B / tokenizer.json

Commit History

Upload GRPO-trained 4B model (step 58, bilevel reward)
36a0915
verified

t2ance commited on