Qwen 2.5 7B GRPO Training Checkpoint

This is a training checkpoint from GRPO (Generative Reinforcement Policy Optimization) training.

Training Status

  • Checkpoint saved at: 2025-09-25T10:12:40.062136
  • Training step: 764/2250
  • Progress: 34%

Configuration

  • Base model: Qwen2.5-7B-Instruct
  • Training method: GRPO with LoRA
  • LoRA rank: 64
  • Max sequence length: 4096
  • Training data: 9000 math problems from math_ood
  • Categories: Number Theory, Logic, Combinatorics, Arithmetic, Algebra

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-7B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-7B-Instruct")

# Load LoRA weights
model = PeftModel.from_pretrained(base_model, "ksamiein/qwen-d1-cluster")

Note

This is an intermediate checkpoint. Training may not be complete.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ksamiein/qwen-d1-cluster

Base model

Qwen/Qwen2.5-7B
Finetuned
(2274)
this model