wang's picture

2 1

wang

zioniiiio

·

AI & ML interests

None yet

Recent Activity

liked a Space about 1 month ago

HuggingFaceTB/smol-training-playbook

upvoted an article 7 months ago

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

upvoted an article 7 months ago

Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)

View all activity

Organizations

None yet

upvoted 2 articles 7 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Feb 7, 2025

•

274

Article

Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)

Jan 19, 2025

•

41