davidanugraha/Qwen3-4B-Instruct-2507-UserSim-Factored-SimPO-Sample-Span 4B • Updated 15 days ago • 38
davidanugraha/Qwen3-4B-Instruct-2507-UserSim-Factored-SimPO-Sample-EqWeightSpan 4B • Updated 15 days ago • 35
davidanugraha/Qwen3-4B-Instruct-2507-UserSim-Factored-SimPO-Sample-InvertedSpan 4B • Updated 15 days ago • 34
davidanugraha/Qwen3-4B-Instruct-2507-UserSim-Factored-SimPO-Sample-NoSpan 4B • Updated 15 days ago • 42
davidanugraha/Qwen3-4B-Instruct-2507-UserSim-SFT-Baseline Text Generation • 4B • Updated 15 days ago • 48
davidanugraha/Qwen3-4B-Instruct-2507-UserSim-SFT-Factored Text Generation • 4B • Updated 15 days ago • 62
davidanugraha/DeepSeek-R1-Distill-Qwen-7B-Overthinking-SFT Text Generation • 8B • Updated Dec 28, 2025 • 2
davidanugraha/DeepSeek-R1-Distill-Qwen-1.5B-Overthinking-SFT Text Generation • 2B • Updated Dec 28, 2025 • 2
davidanugraha/Qwen2.5-Coder-3B-Instruct-ReinfPP-Reflection-16k-20test-passrate 3B • Updated Dec 13, 2025 • 3
davidanugraha/Qwen2.5-Coder-3B-Instruct-ReinfPP-Reflection-16k-20test-binary 3B • Updated Dec 13, 2025 • 3
davidanugraha/Qwen2.5-Coder-3B-Instruct-ReinfPP-Reflection-8k-20test-binary 3B • Updated Dec 13, 2025 • 2
davidanugraha/Qwen2.5-Coder-3B-Instruct-ReinfPP-Reflection-4k-20test-passrate 3B • Updated Dec 13, 2025 • 2
davidanugraha/Qwen2.5-Coder-3B-Instruct-ReinfPP-Reflection-4k-20test-binary 3B • Updated Dec 13, 2025 • 2