--- language: - en base_model: - Qwen/Qwen3-8B --- Downstream policy trained using GenRM-R-Align-14B via PPO.