Xiaoyang Cao's picture

5

Xiaoyang Cao

Sean13

·

https://xiaoyangcao1113.github.io/

AI & ML interests

RLFH, Deep Reinfrocement Learning

Recent Activity

updated a model about 1 month ago

Sean13/responsibility-decomposition

published a model about 1 month ago

Sean13/responsibility-decomposition

updated a model about 2 months ago

Sean13/grpo_nocurriculum_Qwen3-1.7B-100step

View all activity

Organizations

None yet

Sean13 's models 73

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha1.0

7B • Updated Sep 22, 2025 • 1

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.9

7B • Updated Sep 19, 2025 • 3

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.7

7B • Updated Sep 19, 2025 • 1

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.3

Updated Sep 19, 2025

Sean13/mistral-7b-instruct-v0.2-rcpo-full

Text Generation • 7B • Updated Sep 15, 2025 • 3

Sean13/mistral-7b-instruct-v0.2-cpo-full

Text Generation • 7B • Updated Sep 11, 2025 • 2

Sean13/mistral-7b-instruct-v0.2-simpo-full

Text Generation • 7B • Updated Sep 6, 2025 • 3

Sean13/mistral-7b-instruct-v0.2-rsimpo-full

Text Generation • 7B • Updated Sep 6, 2025 • 2

Sean13/mistral-7b-instruct-v0.2-ipo-full

Text Generation • 7B • Updated Aug 19, 2025 • 3

Sean13/mistral-7b-instruct-v0.2-slic_hf-full

Text Generation • 7B • Updated Aug 11, 2025 • 3

Sean13/mistral-7b-instruct-v0.2-rslic_hf-full

Updated Aug 8, 2025

Sean13/mistral-7b-instruct-v0.2-ripo-full

Text Generation • 7B • Updated Aug 3, 2025 • 1

Sean13/mistral-7b-instruct-v0.2-emdpo-full

7B • Updated Jul 24, 2025 • 2