AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-iterDPO-iter2-4k Text Generation • 841k • Updated Aug 10, 2025 • 3 • 1
AmberYifan/Qwen2.5-7B-Instruct-wildfeedback-iterDPO-iter2-4k Text Generation • 333k • Updated Aug 10, 2025 • 4 • 1
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-iterDPO-iter1-4k Text Generation • 841k • Updated Aug 8, 2025 • 2 • 1
AmberYifan/Qwen2.5-7B-Instruct-wildfeedback-iterDPO-iter1-4k Text Generation • 333k • Updated Aug 8, 2025 • 4
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-DRIFT-iter2-RPO Text Generation • 841k • Updated Aug 7, 2025 • 1
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-spin-iter2-RPO Text Generation • 841k • Updated Aug 7, 2025 • 2
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-iterdpo-iter2-RPO Text Generation • 841k • Updated Aug 7, 2025 • 1 • 1
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-iterdpo-iter1-RPO Text Generation • 841k • Updated Aug 6, 2025 • 2
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-spin-iter1-RPO Text Generation • 841k • Updated Aug 6, 2025 • 1 • 1
AmberYifan/Qwen2.5-14B-Instruct-ultrafeedback-drift-iter1-RPO Text Generation • 841k • Updated Aug 5, 2025 • 1
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-DRIFT-iter2-4k Text Generation • 841k • Updated Aug 5, 2025 • 1 • 1
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-DRIFT-iter1-4k Text Generation • 841k • Updated Aug 4, 2025 • 2 • 1
AmberYifan/Qwen2.5-7B-Instruct-wildfeedback-DRIFT-iter2-RPO Text Generation • 333k • Updated Aug 2, 2025 • 3
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-DRIFT-iter2 Text Generation • 841k • Updated Jul 30, 2025 • 3
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-iterDPO-iter2 Text Generation • 841k • Updated Jul 30, 2025 • 2
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-SPIN-iter2 Text Generation • 841k • Updated Jul 30, 2025 • 2
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-iterDPO-iter1 Text Generation • 841k • Updated Jul 29, 2025 • 1
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-DRIFT-iter1 Text Generation • 841k • Updated Jul 29, 2025 • 1 • 1
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-RPO-SPIN-iter1 Text Generation • 841k • Updated Jul 29, 2025 • 1
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-RPO-DRIFT-iter1 Text Generation • 266k • Updated Jul 28, 2025 • 1
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-RPO-iterDPO-iter1 Text Generation • 266k • Updated Jul 28, 2025
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-RPO-SPIN-iter1 Text Generation • 266k • Updated Jul 28, 2025 • 1
AmberYifan/Qwen2.5-14B-Instruct-wildfeedback-seed-RPO-0.001 Text Generation • 841k • Updated Jul 28, 2025 • 2
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-seed-RPO-0.001 Text Generation • 266k • Updated Jul 28, 2025 • 4
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-seed-RPO-1.5 Text Generation • 266k • Updated Jul 27, 2025 • 3
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-seed-RPO-0.01 Text Generation • 266k • Updated Jul 27, 2025 • 1 • 1
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-seed-RPO-0.5 Text Generation • 266k • Updated Jul 25, 2025 • 1
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-seed-RPO Text Generation • 266k • Updated Jul 25, 2025 • 2
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-DRIFT-iter2-re Text Generation • 8B • Updated Jul 24, 2025 • 3
AmberYifan/Llama-3.1-8B-Instruct-wildfeedback-iterDPO-iter2 Text Generation • 8B • Updated Jul 23, 2025 • 2