lyn22333's picture
Create README.md
75ce43c verified
---
language:
- en
base_model:
- Qwen/Qwen3-8B
---
Downstream policy trained using GenRM-R-Align-14B via PPO.