language: - en base_model: - Qwen/Qwen3-8B
Downstream policy trained using GenRM-R-Align-14B via PPO.