Disclaimer: The model has been trained on AWS on an Instance type g4dn.4xlarge (Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux. Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0).

Model Details

Trained a math reasoning model, converting a standard model Qwen2.5 3B Instruct into a math reasoning model using GRPO (Group Relative Policy Optimization), a reinforcement learning algorithm that optimizes responses using reward functions. Defined the rewarding functions to let the model learn how to reason on them, we fine-tuned Qwen2.5 3B Instruct on OpenAI's GSM8K dataset, which contains grade school math problems.

The training took 10h on a Tesla T4. You can find the code used to train the model here.

Uploaded model

  • Developed by: ugriffo
  • License: apache-2.0
  • Finetuned from model : unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit

This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
25
GGUF
Model size
3B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ugriffo/Qwen2.5-3B-Instruct-Math-Reasoning-GGUF