SeungWon Kook
Aiant56
ยท
AI & ML interests
None yet
Recent Activity
upvoted a paper about 10 hours ago
KL for a KL: On-Policy Distillation with Control Variate Baseline upvoted a paper about 10 hours ago
Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States upvoted a paper 19 days ago
ThinkBrake: Efficient Reasoning via Log-Probability Margin Guided DecodingOrganizations
None yet