Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models Paper • 2603.13985 • Published 4 days ago • 9
Wenboz/ultrafeedback_rationale_Qwen2.5-3B-Instruct_ultra_filter_2e-5_thre-0.8_packing_42_cot Updated Mar 3, 2025 • 7
Wenboz/ultrafeedback_rationale_Qwen2.5-3B-Instruct_ultra_sft_2e-5_thre-0.7_packing_42_cot Viewer • Updated Mar 1, 2025 • 63.1k • 10
Wenboz/ultrafeedback_rationale_Qwen2.5-3B-Instruct_ultra_sft_2e-5_thre-0.7_packing_42_cot Viewer • Updated Mar 1, 2025 • 63.1k • 10