wang's picture

2 1

wang

zioniiiio

·

AI & ML interests

None yet

Recent Activity

liked a Space about 1 month ago

HuggingFaceTB/smol-training-playbook

upvoted an article 6 months ago

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

upvoted an article 6 months ago

Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)

View all activity

Organizations

None yet

liked a Space about 1 month ago

The Smol Training Playbook

The secrets to building world-class LLMs

upvoted 2 articles 6 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Feb 7, 2025

•

273

Article

Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)

Jan 19, 2025

•

41