qwen_grpo_ft / README.md
Hunter700's picture
Trained with Unsloth
2771687 verified
metadata
license: mit
tags:
  - unsloth
  - trl
  - grpo