File size: 7,724 Bytes
d72a8ec |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
---
base_model: Qwen/Qwen2.5-Math-1.5B-Instruct
library_name: transformers
model_name: Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM
tags:
- generated_from_trainer
- prm
- trl
- math
- process-reward-model
- qwen2.5
- sharp
---
# Model Card for Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM
## Introduction
**Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM** is a Process Reward Model (PRM) fine-tuned from [Qwen2.5-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct). This model is specifically designed to evaluate the correctness of intermediate reasoning steps in mathematical problem-solving processes, enabling more reliable and interpretable mathematical reasoning.
The model has been trained on the **SHARP-Math** dataset using the Process Reward Model methodology, which provides step-by-step feedback on mathematical reasoning chains.
This model is part of the SHARP-PRM series, trained using advanced Process Reward Model techniques.
## Model Information
### Base Model
- **Base Model**: [Qwen/Qwen2.5-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct)
- **Architecture**: Qwen2ForTokenClassification
- **Parameters**: 1.5B
### Training Details
- **Training Dataset**: SHARP-Math (Process Reward Model dataset)
- **Training Method**: Process Reward Model (PRM) as introduced in [Uesato et al., 2022](https://huggingface.co/papers/2211.14275)
- **Training Framework**: [TRL (Transformer Reinforcement Learning)](https://github.com/huggingface/trl) v0.24.0
- **Task Type**: Token Classification (binary classification: error/correct for each reasoning step)
## PRM Evaluation
This model is designed to evaluate mathematical reasoning processes by:
1. **Step-level Evaluation**: Classifying each step in a reasoning chain as either "correct" or "error"
2. **Process Feedback**: Providing feedback on the reasoning process, not just the final answer
3. **Error Detection**: Identifying where mistakes occur in multi-step mathematical solutions
### Evaluation Metrics
The model is evaluated on the [ProcessBench](https://huggingface.co/datasets/Qwen/ProcessBench) benchmark.
Key metrics include:
- **Error Accuracy**: Ability to correctly identify incorrect steps
- **Correct Accuracy**: Ability to correctly identify correct steps
- **F1 Score**: Balanced measure of error and correct step classification
## Quick Start
### Installation
```bash
pip install transformers torch
```
### Basic Usage
#### Using the Model for Step Classification
```python
from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch
import torch.nn.functional as F
model_name = "path/to/Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
model.eval()
# Example: Evaluate a mathematical reasoning chain
# Problem with steps (one correct, one incorrect)
problem = "Solve: 2x + 5 = 13"
steps = [
"Subtract 5 from both sides: 2x = 8", # Correct step
"Divide by 2: x = 5" # Incorrect step (should be x = 4)
]
# Format input with step separator
input_text = problem + "\n\n" + "\n\n".join(steps)
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=8192)
# Get model predictions
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits # Shape: [batch_size, sequence_length, num_labels]
probabilities = F.softmax(logits, dim=-1) # Convert to probabilities
predictions = torch.argmax(logits, dim=-1) # Get predicted class indices
# Aggregate predictions per step
# In practice, you would map tokens to steps based on your step separator
labels = ["error", "correct"]
for i, step in enumerate(steps):
# Get average probability for step tokens (simplified)
# In real usage, you'd need to map token positions to step boundaries
step_start = len(tokenizer(problem + "\n\n", return_tensors="pt")["input_ids"][0])
step_tokens = predictions[0, step_start:step_start+len(tokenizer(step)["input_ids"])]
step_label = labels[step_tokens.mode().values.item()] if len(step_tokens) > 0 else "unknown"
print(f"\nStep {i+1}: {step}")
print(f" Prediction: {step_label}")
print(f" Confidence: {probabilities[0, step_start, 1].item():.2%}")
# Expected output:
# Step 1: Subtract 5 from both sides: 2x = 8
# Prediction: correct
# Confidence: 0.95
#
# Step 2: Divide by 2: x = 5
# Prediction: error
# Confidence: 0.87
```
**Output Interpretation:**
- **Logits**: Raw scores from the model (before softmax). Higher values indicate stronger confidence.
- **Probabilities**: Softmax-normalized scores between 0 and 1. Sum to 1 for each token.
- **Predictions**: Class indices (0 = "error", 1 = "correct") for each token.
#### Using with Pipeline
```python
from transformers import pipeline
classifier = pipeline(
"token-classification",
model="path/to/Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM",
tokenizer="path/to/Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM",
device=0 if torch.cuda.is_available() else -1
)
# Classify reasoning steps
result = classifier(problem + "\n\n" + "\n\n".join(steps))
```
### Integration with Mathematical Reasoning
This PRM model can be used to:
1. **Filter incorrect reasoning paths** in tree-of-thought or chain-of-thought generation
2. **Provide feedback** during step-by-step problem solving
3. **Evaluate solution quality** before final answer generation
4. **Improve training** by identifying problematic reasoning patterns
## Training Procedure
### Training Configuration
- **Learning Rate**: 2e-5
- **Batch Size**: Per-device batch size (with gradient accumulation)
- **Epochs**: Multiple epochs with early stopping
- **Optimizer**: AdamW with cosine learning rate schedule
- **Warmup Ratio**: 3%
- **Gradient Clipping**: 5.0
- **Precision**: bfloat16
- **Gradient Checkpointing**: Enabled for memory efficiency
### Training Framework Versions
- **TRL**: 0.24.0
- **Transformers**: 4.56.2
- **PyTorch**: 2.9.1
- **Datasets**: 4.4.1
- **Tokenizers**: 0.22.1
### Training Data
The model was trained on the **SHARP-Math** dataset, which contains:
- Mathematical problems with step-by-step solutions
- Labeled reasoning steps (correct/error)
- Diverse mathematical domains and difficulty levels
## Use Cases
### 1. Mathematical Reasoning Evaluation
- Evaluate intermediate steps in mathematical problem-solving
- Identify errors in multi-step calculations
- Provide feedback on reasoning quality
### 2. Educational Applications
- Automated grading of mathematical solutions
- Step-by-step feedback for students
- Identification of common error patterns
### 3. Research Applications
- Training better mathematical reasoning models
- Analyzing reasoning patterns
- Improving chain-of-thought generation
## Limitations and Considerations
1. **Domain Specificity**: This model is specifically trained for mathematical reasoning and may not generalize well to other domains
2. **Step Length**: The model is optimized for step-level evaluation with a 256-token context per step
3. **Language**: The model is primarily trained on English mathematical content
4. **False Positives/Negatives**: Like all classification models, it may misclassify some steps
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{qwen2.5-math-1.5b-instruct-sharp-math-prm,
title={Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM: A Process Reward Model for Mathematical Reasoning},
author={Your Name/Organization},
year={2025},
howpublished={\url{https://huggingface.co/path/to/Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM}}
}
```
**Model Card Version**: 1.0
**Last Updated**: 2025-12-30
|