LLM-RL RLHF to RLVR Nagi-ovo/DeepSeek-V3.1-Math-RL-G16-LoRA Updated Jan 31 Nagi-ovo/Qwen3-235B-A22B-Instruct-MATH-RL-LoRA Updated Jan 31 Nagi-ovo/Qwen2.5-7B-Reasoning-Adapter Text Generation • Updated Feb 8, 2025 • 6 Nagi-ovo/Llama-3-8B-PPO Text Generation • 8B • Updated Jan 21, 2025 • 2
Llama-3-8B-RLHF-Pipeline Nagi-ovo/Llama-3-8B-SFT-RuoZhiBa Text Generation • 8B • Updated Jan 7, 2025 • 5 Nagi-ovo/Llama-3-8B-DPO Text Generation • 8B • Updated Jan 6, 2025 • 6 Nagi-ovo/Llama-3-8B-RM Text Classification • 8B • Updated Jan 6, 2025 • 5 • 2 Nagi-ovo/Llama-3-8B-PPO Text Generation • 8B • Updated Jan 21, 2025 • 2
LLM-RL RLHF to RLVR Nagi-ovo/DeepSeek-V3.1-Math-RL-G16-LoRA Updated Jan 31 Nagi-ovo/Qwen3-235B-A22B-Instruct-MATH-RL-LoRA Updated Jan 31 Nagi-ovo/Qwen2.5-7B-Reasoning-Adapter Text Generation • Updated Feb 8, 2025 • 6 Nagi-ovo/Llama-3-8B-PPO Text Generation • 8B • Updated Jan 21, 2025 • 2
Llama-3-8B-RLHF-Pipeline Nagi-ovo/Llama-3-8B-SFT-RuoZhiBa Text Generation • 8B • Updated Jan 7, 2025 • 5 Nagi-ovo/Llama-3-8B-DPO Text Generation • 8B • Updated Jan 6, 2025 • 6 Nagi-ovo/Llama-3-8B-RM Text Classification • 8B • Updated Jan 6, 2025 • 5 • 2 Nagi-ovo/Llama-3-8B-PPO Text Generation • 8B • Updated Jan 21, 2025 • 2