Evaluation of DPO Configurations An Empirical Study of DPO Configuration Choices for LLM Alignment jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_hh-rlhf Text Generation • Updated Sep 26, 2025 jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_oasst1 Text Generation • Updated Sep 26, 2025 jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_PKU-SafeRLHF Text Generation • Updated Sep 26, 2025 jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_ultrafeedback Text Generation • Updated Sep 26, 2025
EditPrefs Generating human preferences from historical text edits jmajkutewicz/WikiPrefs Viewer • Updated Sep 26, 2025 • 65.3k • 17
Evaluation of DPO Configurations An Empirical Study of DPO Configuration Choices for LLM Alignment jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_hh-rlhf Text Generation • Updated Sep 26, 2025 jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_oasst1 Text Generation • Updated Sep 26, 2025 jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_PKU-SafeRLHF Text Generation • Updated Sep 26, 2025 jmajkutewicz/Llama-3.1-Tulu-3-8B-DPO_ultrafeedback Text Generation • Updated Sep 26, 2025
EditPrefs Generating human preferences from historical text edits jmajkutewicz/WikiPrefs Viewer • Updated Sep 26, 2025 • 65.3k • 17