RLHF to RLVR
Jesse Zhang
Nagi-ovo
AI & ML interests
Humanoids & RL
Organizations
None yet
models 27
Nagi-ovo/HOMIERL-loco
Robotics • Updated • 1
Nagi-ovo/Qwen3-235B-A22B-Instruct-MATH-RL-LoRA
Updated
Nagi-ovo/DeepSeek-V3.1-Math-RL-G16-LoRA
Updated
Nagi-ovo/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 2
Nagi-ovo/Llama-2-7b-chat-finetune
Text Generation • 7B • Updated • 3
Nagi-ovo/Qwen2.5-7B-Reasoning-Adapter
Text Generation • Updated • 6
Nagi-ovo/Llama-3-8B-PPO
Text Generation • 8B • Updated • 2
Nagi-ovo/Llama-3-8B-SFT-RuoZhiBa
Text Generation • 8B • Updated • 5
Nagi-ovo/Llama-3-8B-RM
Text Classification • 8B • Updated • 5 • 2
Nagi-ovo/Llama-3-8B-DPO
Text Generation • 8B • Updated • 6
datasets 0
None public yet