wang

wzx111

AI & ML interests

None yet

Recent Activity

updated a model about 1 month ago

wzx111/14B-Aggressive-OPO-Delta-LR2e-6-G32

published a model about 1 month ago

wzx111/14B-Aggressive-OPO-Delta-LR2e-6-G32

updated a model about 1 month ago

wzx111/14B-Aggressive-GSPO-LR2e-6-G32

View all activity

Organizations

None yet

updated a model about 1 month ago

wzx111/14B-Aggressive-OPO-Delta-LR2e-6-G32

Updated Mar 8

published a model about 1 month ago

wzx111/14B-Aggressive-OPO-Delta-LR2e-6-G32

Updated Mar 8

updated a model about 1 month ago

wzx111/14B-Aggressive-GSPO-LR2e-6-G32

Updated Mar 8

published a model about 1 month ago

wzx111/14B-Aggressive-GSPO-LR2e-6-G32

Updated Mar 8

New activity in wzx111/Qwen3-1.7B-MATH-GDPO 4 months ago

Which post-training method was actually used for this model, GDPO or GRPO?

#1 opened 4 months ago by

roseblooming

updated a dataset 4 months ago

wzx111/MATH-lighteval-level3

Viewer • Updated Dec 9, 2025 • 2.72k • 4

published a dataset 4 months ago

wzx111/MATH-lighteval-level3

Viewer • Updated Dec 9, 2025 • 2.72k • 4

published a model 5 months ago

wzx111/Qwen3-1.7B-GRPO-math

Updated Nov 29, 2025

updated a model 5 months ago

wzx111/Qwen3-1.7B-GRPO-math

Updated Nov 29, 2025

updated a dataset 5 months ago

wzx111/MATH-lighteval-level-middlehigh

Viewer • Updated Nov 24, 2025 • 5.63k • 11

published a dataset 5 months ago

wzx111/MATH-lighteval-level-middlehigh

Viewer • Updated Nov 24, 2025 • 5.63k • 11

updated a dataset 5 months ago

wzx111/MATH-lighteval-level-middle

Viewer • Updated Nov 24, 2025 • 7.87k • 14

published a dataset 5 months ago

wzx111/MATH-lighteval-level-middle

Viewer • Updated Nov 24, 2025 • 7.87k • 14

updated a model 5 months ago

wzx111/Qwen3-1.7B-Open-R1-ADPO

Text Generation • 2B • Updated Nov 23, 2025 • 2

published a model 5 months ago

wzx111/Qwen3-1.7B-Open-R1-ADPO

Text Generation • 2B • Updated Nov 23, 2025 • 2

updated a model 5 months ago

wzx111/Qwen3-1.7B-Open-R1-GRPO-Baseline

Text Generation • 2B • Updated Nov 22, 2025 • 5

published a model 5 months ago

wzx111/Qwen3-1.7B-Open-R1-GRPO-Baseline

Text Generation • 2B • Updated Nov 22, 2025 • 5

New activity in Qwen/Qwen3-235B-A22B 11 months ago

是不是奖励函数没有ngram重复度惩罚

#7 opened 12 months ago by

wzx111

updated a model 11 months ago

wzx111/Qwen3-1.7B-Open-R1-GRPO

2B • Updated May 14, 2025 • 1

published a model 11 months ago

wzx111/Qwen3-1.7B-Open-R1-GRPO

2B • Updated May 14, 2025 • 1

wang

AI & ML interests

Recent Activity

Organizations

wzx111's activity

Which post-training method was actually used for this model, GDPO or GRPO?

是不是奖励函数没有ngram重复度惩罚