Post Training Versions - Qwen 0.6B - a AIPlans Collection

AIPlans 's Collections

Cross Coders

Model Diffing Project

Post Training Versions - Qwen 0.6B

Red Teaming Alignment Evals

Model Diffing

Post Training Versions - Qwen 0.6B

updated Mar 27

Different versions of Qwen 0.6b, where the only difference is the post training method used. The post training database will be the HelpSteer2 dataset

Upvote

AIPlans/Qwen3-0.6B-ORPO

Text Generation • Updated Nov 28, 2025 • 10
AIPlans/Qwen3-0.6B-DPO_NOTLORA

Text Generation • 0.6B • Updated Nov 25, 2025 • 5
AIPlans/Qwen3-0.6B-GRPO_Epoch2

Text Generation • 0.6B • Updated Dec 18, 2025 • 6
AIPlans/Qwen3-0.6B-ReMax

Reinforcement Learning • 0.6B • Updated Dec 22, 2025 • 10 • 2
AIPlans/Qwen3-0.6B-IPO

Reinforcement Learning • 0.6B • Updated Dec 12, 2025 • 21 • 1
AIPlans/Qwen3-0.6B-KTO

Text Generation • Updated Nov 22, 2025 • 8 • 1
AIPlans/Qwen3-0.6B-PPO

Text Generation • 0.6B • Updated Mar 27 • 90 • • 1

Upvote

Collection guide
Browse collections