wang

wzx111

4

AI & ML interests

None yet

Organizations

None yet

New activity in wzx111/Qwen3-1.7B-MATH-GDPO 7 months ago

Which post-training method was actually used for this model, GDPO or GRPO?

#1 opened 7 months ago by

New activity in Qwen/Qwen3-235B-A22B about 1 year ago

是不是奖励函数没有ngram重复度惩罚

#7 opened about 1 year ago by

New activity in Qwen/Qwen3-1.7B about 1 year ago

【Evaluation】Best practice for evaluating Qwen3 !!

#2 opened about 1 year ago by

New activity in wzx111/Qwen2.5-1.5B-Open-R1-GRPO about 1 year ago

Improve language tag

#1 opened about 1 year ago by