GRPO trained LoRA model based on unsloth/Qwen3-4B (Trained with Unsloth) b8d51bd verified thejaminator commited on Jun 2, 2025
GRPO trained LoRA model based on unsloth/Qwen3-4B (Trained with Unsloth) 99d21f0 verified thejaminator commited on Jun 2, 2025