Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Miracle12345
/
gemma-3-GRPO
like
0
Reinforcement Learning
Safetensors
English
unsloth
lora
grpo
gemma-3
instruction-tuning
math-reasoning
arxiv:
2402.03300
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
Use this model
main
gemma-3-GRPO
Commit History
finalize
c228698
Miracle12345
commited on
Sep 6, 2025
add metadata and inference code
3d5e034
Miracle12345
commited on
Sep 6, 2025
add readme.md
3775f62
Miracle12345
commited on
Sep 6, 2025
Upload model trained with Unsloth
3c02089
verified
Miracle12345
commited on
Sep 5, 2025
Upload model trained with Unsloth
6f59c16
verified
Miracle12345
commited on
Sep 5, 2025
initial commit
388ca1b
verified
Miracle12345
commited on
Sep 5, 2025