Add GRPO Qwen2.5-3B full model (DINOv2 reward, 1701 steps) 79138e4 verified saeedbenadeeb commited on 15 days ago
Upload best text-only SFT LoRA checkpoint (Qwen2.5-3B-Instruct) ed539a4 verified saeedbenadeeb commited on Feb 18