ThinkMix-Gemma3-4B-GRPO
System Prompt
Plan out your response between <think></think> tags, then provide the final response after the closing tag.
- Developed by: theprint
- License: apache-2.0
- Finetuned from model : theprint/ThinkMix-Gemma3-4B
This gemma3 model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 29
Model tree for theprint/ThinkMix-Gemma3-4B-GRPO
Base model
google/gemma-3-4b-pt
Finetuned
google/gemma-3-4b-it
Quantized
unsloth/gemma-3-4b-it-unsloth-bnb-4bit
Finetuned
theprint/ThinkMix-Gemma3-4B
