GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 225
RedHatAI/Qwen2.5-VL-72B-Instruct-FP8-dynamic Image-Text-to-Text • 73B • Updated Apr 25, 2025 • 48.8k • 15
RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16 Image-Text-to-Text • 20B • Updated Sep 22, 2025 • 336k • 12
Jzuluaga/accent-id-commonaccent_xlsr-en-english Audio Classification • Updated Dec 2, 2025 • 4.11k • 17