Instructions to use FutureMa/Qwen2.5-7B-Instruct-GRPO-Math with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FutureMa/Qwen2.5-7B-Instruct-GRPO-Math with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="FutureMa/Qwen2.5-7B-Instruct-GRPO-Math")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("FutureMa/Qwen2.5-7B-Instruct-GRPO-Math", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use FutureMa/Qwen2.5-7B-Instruct-GRPO-Math with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FutureMa/Qwen2.5-7B-Instruct-GRPO-Math"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FutureMa/Qwen2.5-7B-Instruct-GRPO-Math",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/FutureMa/Qwen2.5-7B-Instruct-GRPO-Math

SGLang

How to use FutureMa/Qwen2.5-7B-Instruct-GRPO-Math with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FutureMa/Qwen2.5-7B-Instruct-GRPO-Math" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FutureMa/Qwen2.5-7B-Instruct-GRPO-Math",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FutureMa/Qwen2.5-7B-Instruct-GRPO-Math" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FutureMa/Qwen2.5-7B-Instruct-GRPO-Math",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use FutureMa/Qwen2.5-7B-Instruct-GRPO-Math with Docker Model Runner:
```
docker model run hf.co/FutureMa/Qwen2.5-7B-Instruct-GRPO-Math
```

Qwen2.5-7B-Instruct-GRPO-Math

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct using GRPO (Group Relative Policy Optimization) on mathematical reasoning tasks.

Model Description

Base Model: Qwen2.5-7B-Instruct
Training Method: GRPO (Reinforcement Learning)
Training Framework: ms-swift
Training Data: AI-MO/NuminaMath-TIR (500 samples)
Hardware: 1x NVIDIA H100 PCIe (80GB)
Training Time: ~2.5 hours

Training Details

Training Configuration

CUDA_VISIBLE_DEVICES=0 \
swift rlhf \
    --rlhf_type grpo \
    --model Qwen/Qwen2.5-7B-Instruct \
    --reward_funcs accuracy format \
    --train_type lora \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --torch_dtype bfloat16 \
    --dataset 'AI-MO/NuminaMath-TIR#500' \
    --num_train_epochs 1 \
    --per_device_train_batch_size 2 \
    --learning_rate 5e-5 \
    --num_generations 2

Training Metrics

Final Loss: 0.00011567
Math Accuracy: 70%
Reward: 0.7
Training Steps: 500

Usage

Using with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(
    base_model,
    "FutureMa/Qwen2.5-7B-Instruct-GRPO-Math"
)

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

# Generate
messages = [
    {"role": "user", "content": "Solve for x: 2x^2 - 3x + 1 = 0"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Using with ms-swift

# Inference
swift infer \
    --ckpt_dir FutureMa/Qwen2.5-7B-Instruct-GRPO-Math \
    --eval_human false

Intended Use

This model is optimized for:

✅ Mathematical reasoning and problem-solving
✅ Step-by-step solution generation
✅ Algebraic equation solving
✅ Arithmetic calculations

Limitations

Trained on a relatively small dataset (500 samples)
May not generalize well to very complex mathematical problems
LoRA fine-tuning may have limited capacity compared to full fine-tuning

Citation

@misc{qwen2.5-grpo-math,
  author = {FutureMa},
  title = {Qwen2.5-7B-Instruct Fine-tuned with GRPO on Math Tasks},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/FutureMa/Qwen2.5-7B-Instruct-GRPO-Math}}
}

Acknowledgments

Base model: Qwen Team
Training framework: ms-swift
Dataset: AI-MO/NuminaMath-TIR

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for FutureMa/Qwen2.5-7B-Instruct-GRPO-Math

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct