SAVE - a Yofuria Collection

Yofuria 's Collections

SAVE

updated about 22 hours ago

The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement

Yofuria/UltraFeedback-binarized-ms-swift-1024

Viewer • Updated 21 days ago • 38.9k • 59
Yofuria/UltraFeedback-ms-swift-1024

Viewer • Updated Apr 27 • 41k • 148
Yofuria/Skywork-Reward-Preference-80K-v0.2-ms-swift

Viewer • Updated Nov 18, 2025 • 77k • 5