Qwen 2.5 7B GRPO Training Checkpoint
This is a training checkpoint from GRPO (Generative Reinforcement Policy Optimization) training.
Training Status
- Checkpoint saved at: 2025-09-25T10:12:40.062136
- Training step: 764/2250
- Progress: 34%
Configuration
- Base model: Qwen2.5-7B-Instruct
- Training method: GRPO with LoRA
- LoRA rank: 64
- Max sequence length: 4096
- Training data: 9000 math problems from math_ood
- Categories: Number Theory, Logic, Combinatorics, Arithmetic, Algebra
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-7B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-7B-Instruct")
# Load LoRA weights
model = PeftModel.from_pretrained(base_model, "ksamiein/qwen-d1-cluster")
Note
This is an intermediate checkpoint. Training may not be complete.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support