Qwen 2.5 7B GRPO Training Checkpoint

This is a training checkpoint from GRPO (Generative Reinforcement Policy Optimization) training.

Training Status

Checkpoint saved at: 2025-09-25T10:12:40.062136
Training step: 764/2250
Progress: 34%

Configuration

Base model: Qwen2.5-7B-Instruct
Training method: GRPO with LoRA
LoRA rank: 64
Max sequence length: 4096
Training data: 9000 math problems from math_ood
Categories: Number Theory, Logic, Combinatorics, Arithmetic, Algebra

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-7B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-7B-Instruct")

# Load LoRA weights
model = PeftModel.from_pretrained(base_model, "ksamiein/qwen-d1-cluster")

Note

This is an intermediate checkpoint. Training may not be complete.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ksamiein/qwen-d1-cluster

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

unsloth/Qwen2.5-7B-Instruct

Finetuned

(2311)

this model