nvidia/Llama-3.3-Nemotron-70B-Reward-Principle Text Generation • 71B • Updated Oct 30, 2025 • 119 • 6
RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards Paper • 2509.21319 • Published Sep 25, 2025 • 7