Hejian Sang's picture

Hejian Sang

pb09204048

·

AI & ML interests

None yet

Recent Activity

liked a dataset 1 day ago

ByteDance-Seed/BeyondAIME

upvoted a paper about 2 months ago

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

authored a paper 2 months ago

TIP: Token Importance in On-Policy Distillation

View all activity

Organizations

upvoted a paper about 2 months ago

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

Paper • 2605.12483 • Published May 12 • 10

upvoted a paper 2 months ago

TIP: Token Importance in On-Policy Distillation

Paper • 2604.14084 • Published Apr 15 • 15

upvoted 2 papers 4 months ago

On-Policy Self-Distillation for Reasoning Compression

Paper • 2603.05433 • Published Mar 5 • 9

Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning

Paper • 2602.21420 • Published Feb 24 • 6

upvoted 2 papers 9 months ago

Debunk the Myth of SFT Generalization

Paper • 2510.00237 • Published Sep 30, 2025 • 2

Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs

Paper • 2509.25779 • Published Sep 30, 2025 • 19

upvoted a collection over 1 year ago

Tulu 3 Datasets

All datasets released with Tulu 3 -- state of the art open post-training recipes. • 32 items • Updated Mar 2 • 100