abharadwaj123's picture
GRPO checkpoint: model.safetensors + tokenizer (no optimizer)
3774aea verified