ZaandaTeika's picture
Convert model to bfloat16 and fix total_parameters metadata
ce0ad6f verified
---
base_model: Qwen/Qwen2.5-Math-7B-Instruct
library_name: transformers
model_name: Qwen2.5-Math-7B-Instruct-SHARP-Math-PRM
tags:
- generated_from_trainer
- prm
- trl
- math
- process-reward-model
- qwen2.5
- sharp
---
# Model Card for Qwen2.5-Math-7B-Instruct-SHARP-Math-PRM
## Introduction
**Qwen2.5-Math-7B-Instruct-SHARP-Math-PRM** is a Process Reward Model (PRM) fine-tuned from [Qwen2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct). This model is specifically designed to evaluate the correctness of intermediate reasoning steps in mathematical problem-solving processes, enabling more reliable and interpretable mathematical reasoning.
The model has been trained on the **SHARP-Math** dataset using the Process Reward Model methodology, which provides step-by-step feedback on mathematical reasoning chains.
This model is part of the SHARP-PRM series, trained using advanced Process Reward Model techniques.
## Model Information
### Base Model
- **Base Model**: [Qwen/Qwen2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct)
- **Architecture**: Qwen2ForTokenClassification
- **Parameters**: 7B
### Training Details
- **Training Dataset**: SHARP-Math (Process Reward Model dataset)
- **Training Method**: Process Reward Model (PRM) as introduced in [Uesato et al., 2022](https://huggingface.co/papers/2211.14275)
- **Training Framework**: [TRL (Transformer Reinforcement Learning)](https://github.com/huggingface/trl) v0.24.0
- **Task Type**: Token Classification (binary classification: error/correct for each reasoning step)
## PRM Evaluation
This model is designed to evaluate mathematical reasoning processes by:
1. **Step-level Evaluation**: Classifying each step in a reasoning chain as either "correct" or "error"
2. **Process Feedback**: Providing feedback on the reasoning process, not just the final answer
3. **Error Detection**: Identifying where mistakes occur in multi-step mathematical solutions
### Evaluation Metrics
The model is evaluated on the [ProcessBench](https://huggingface.co/datasets/Qwen/ProcessBench) benchmark.
Key metrics include:
- **Error Accuracy**: Ability to correctly identify incorrect steps
- **Correct Accuracy**: Ability to correctly identify correct steps
- **F1 Score**: Balanced measure of error and correct step classification
## Quick Start
### Installation
```bash
pip install transformers torch
```
### Basic Usage
#### Using the Model for Step Classification
```python
from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch
import torch.nn.functional as F
model_name = "path/to/Qwen2.5-Math-7B-Instruct-SHARP-Math-PRM"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
model.eval()
# Example: Evaluate a mathematical reasoning chain
# Problem with steps (one correct, one incorrect)
problem = "Solve: 2x + 5 = 13"
steps = [
"Subtract 5 from both sides: 2x = 8", # Correct step
"Divide by 2: x = 5" # Incorrect step (should be x = 4)
]
# Format input with step separator
input_text = problem + "\n\n" + "\n\n".join(steps)
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=8192)
# Get model predictions
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits # Shape: [batch_size, sequence_length, num_labels]
probabilities = F.softmax(logits, dim=-1) # Convert to probabilities
predictions = torch.argmax(logits, dim=-1) # Get predicted class indices
# Aggregate predictions per step
# In practice, you would map tokens to steps based on your step separator
labels = ["error", "correct"]
for i, step in enumerate(steps):
# Get average probability for step tokens (simplified)
# In real usage, you'd need to map token positions to step boundaries
step_start = len(tokenizer(problem + "\n\n", return_tensors="pt")["input_ids"][0])
step_tokens = predictions[0, step_start:step_start+len(tokenizer(step)["input_ids"])]
step_label = labels[step_tokens.mode().values.item()] if len(step_tokens) > 0 else "unknown"
print(f"\nStep {i+1}: {step}")
print(f" Prediction: {step_label}")
print(f" Confidence: {probabilities[0, step_start, 1].item():.2%}")
# Expected output:
# Step 1: Subtract 5 from both sides: 2x = 8
# Prediction: correct
# Confidence: 0.95
#
# Step 2: Divide by 2: x = 5
# Prediction: error
# Confidence: 0.87
```
**Output Interpretation:**
- **Logits**: Raw scores from the model (before softmax). Higher values indicate stronger confidence.
- **Probabilities**: Softmax-normalized scores between 0 and 1. Sum to 1 for each token.
- **Predictions**: Class indices (0 = "error", 1 = "correct") for each token.
#### Using with Pipeline
```python
from transformers import pipeline
classifier = pipeline(
"token-classification",
model="path/to/Qwen2.5-Math-7B-Instruct-SHARP-Math-PRM",
tokenizer="path/to/Qwen2.5-Math-7B-Instruct-SHARP-Math-PRM",
device=0 if torch.cuda.is_available() else -1
)
# Classify reasoning steps
result = classifier(problem + "\n\n" + "\n\n".join(steps))
```
### Integration with Mathematical Reasoning
This PRM model can be used to:
1. **Filter incorrect reasoning paths** in tree-of-thought or chain-of-thought generation
2. **Provide feedback** during step-by-step problem solving
3. **Evaluate solution quality** before final answer generation
4. **Improve training** by identifying problematic reasoning patterns
## Training Procedure
### Training Configuration
- **Learning Rate**: 2e-5
- **Batch Size**: Per-device batch size (with gradient accumulation)
- **Epochs**: Multiple epochs with early stopping
- **Optimizer**: AdamW with cosine learning rate schedule
- **Warmup Ratio**: 3%
- **Gradient Clipping**: 5.0
- **Precision**: bfloat16
- **Gradient Checkpointing**: Enabled for memory efficiency
### Training Framework Versions
- **TRL**: 0.24.0
- **Transformers**: 4.56.2
- **PyTorch**: 2.9.1
- **Datasets**: 4.4.1
- **Tokenizers**: 0.22.1
### Training Data
The model was trained on the **SHARP-Math** dataset, which contains:
- Mathematical problems with step-by-step solutions
- Labeled reasoning steps (correct/error)
- Diverse mathematical domains and difficulty levels
## Use Cases
### 1. Mathematical Reasoning Evaluation
- Evaluate intermediate steps in mathematical problem-solving
- Identify errors in multi-step calculations
- Provide feedback on reasoning quality
### 2. Educational Applications
- Automated grading of mathematical solutions
- Step-by-step feedback for students
- Identification of common error patterns
### 3. Research Applications
- Training better mathematical reasoning models
- Analyzing reasoning patterns
- Improving chain-of-thought generation
## Limitations and Considerations
1. **Domain Specificity**: This model is specifically trained for mathematical reasoning and may not generalize well to other domains
2. **Step Length**: The model is optimized for step-level evaluation with a 256-token context per step
3. **Language**: The model is primarily trained on English mathematical content
4. **False Positives/Negatives**: Like all classification models, it may misclassify some steps
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{qwen2.5-math-7b-instruct-sharp-math-prm,
title={Qwen2.5-Math-7B-Instruct-SHARP-Math-PRM: A Process Reward Model for Mathematical Reasoning},
author={Your Name/Organization},
year={2025},
howpublished={\url{https://huggingface.co/path/to/Qwen2.5-Math-7B-Instruct-SHARP-Math-PRM}}
}
```
**Model Card Version**: 1.0
**Last Updated**: 2025-12-30