wang
wzx111
AI & ML interests
None yet
Organizations
None yet
Which post-training method was actually used for this model, GDPO or GRPO?
1
#1 opened 2 months ago
by
roseblooming
是不是奖励函数没有ngram重复度惩罚
2
#7 opened 10 months ago
by
wzx111
【Evaluation】Best practice for evaluating Qwen3 !!
🚀 🔥 2
1
#2 opened 10 months ago
by
wangxingjun778
Improve language tag
#1 opened 10 months ago
by
lbourdois