Disclaimer: The model has been trained on AWS on an Instance type g4dn.4xlarge (Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux. Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0).

Model Details

Trained a math reasoning model, converting a standard model Qwen2.5 3B Instruct into a math reasoning model using GRPO (Group Relative Policy Optimization), a reinforcement learning algorithm that optimizes responses using reward functions. Defined the rewarding functions to let the model learn how to reason on them, we fine-tuned Qwen2.5 3B Instruct on OpenAI's GSM8K dataset, which contains grade school math problems.

The training took 10h on a Tesla T4. You can find the code used to train the model here.

Uploaded model

  • Developed by: ugriffo
  • License: apache-2.0
  • Finetuned from model : unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit

This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
9
GGUF
Model size
3B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ugriffo/Qwen2.5-3B-Instruct-Math-Reasoning-GGUF