feat: default GRPO/eval to Qwen/Qwen3-1.7B + LoRA; disable Qwen3 thinking in chat template eff6196 raj921 commited on Apr 25