|
|
--- |
|
|
base_model: Qwen/Qwen2.5-Math-1.5B-Instruct |
|
|
library_name: transformers |
|
|
model_name: Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM |
|
|
tags: |
|
|
- generated_from_trainer |
|
|
- prm |
|
|
- trl |
|
|
- math |
|
|
- process-reward-model |
|
|
- qwen2.5 |
|
|
- sharp |
|
|
--- |
|
|
|
|
|
# Model Card for Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM |
|
|
|
|
|
## Introduction |
|
|
|
|
|
**Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM** is a Process Reward Model (PRM) fine-tuned from [Qwen2.5-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct). This model is specifically designed to evaluate the correctness of intermediate reasoning steps in mathematical problem-solving processes, enabling more reliable and interpretable mathematical reasoning. |
|
|
|
|
|
The model has been trained on the **SHARP-Math** dataset using the Process Reward Model methodology, which provides step-by-step feedback on mathematical reasoning chains. |
|
|
|
|
|
This model is part of the SHARP-PRM series, trained using advanced Process Reward Model techniques. |
|
|
|
|
|
## Model Information |
|
|
|
|
|
### Base Model |
|
|
- **Base Model**: [Qwen/Qwen2.5-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct) |
|
|
- **Architecture**: Qwen2ForTokenClassification |
|
|
- **Parameters**: 1.5B |
|
|
|
|
|
### Training Details |
|
|
- **Training Dataset**: SHARP-Math (Process Reward Model dataset) |
|
|
- **Training Method**: Process Reward Model (PRM) as introduced in [Uesato et al., 2022](https://huggingface.co/papers/2211.14275) |
|
|
- **Training Framework**: [TRL (Transformer Reinforcement Learning)](https://github.com/huggingface/trl) v0.24.0 |
|
|
- **Task Type**: Token Classification (binary classification: error/correct for each reasoning step) |
|
|
|
|
|
## PRM Evaluation |
|
|
|
|
|
This model is designed to evaluate mathematical reasoning processes by: |
|
|
1. **Step-level Evaluation**: Classifying each step in a reasoning chain as either "correct" or "error" |
|
|
2. **Process Feedback**: Providing feedback on the reasoning process, not just the final answer |
|
|
3. **Error Detection**: Identifying where mistakes occur in multi-step mathematical solutions |
|
|
|
|
|
### Evaluation Metrics |
|
|
The model is evaluated on the [ProcessBench](https://huggingface.co/datasets/Qwen/ProcessBench) benchmark. |
|
|
|
|
|
Key metrics include: |
|
|
- **Error Accuracy**: Ability to correctly identify incorrect steps |
|
|
- **Correct Accuracy**: Ability to correctly identify correct steps |
|
|
- **F1 Score**: Balanced measure of error and correct step classification |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install transformers torch |
|
|
``` |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
#### Using the Model for Step Classification |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForTokenClassification, AutoTokenizer |
|
|
import torch |
|
|
import torch.nn.functional as F |
|
|
|
|
|
model_name = "path/to/Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForTokenClassification.from_pretrained(model_name) |
|
|
model.eval() |
|
|
|
|
|
# Example: Evaluate a mathematical reasoning chain |
|
|
# Problem with steps (one correct, one incorrect) |
|
|
problem = "Solve: 2x + 5 = 13" |
|
|
steps = [ |
|
|
"Subtract 5 from both sides: 2x = 8", # Correct step |
|
|
"Divide by 2: x = 5" # Incorrect step (should be x = 4) |
|
|
] |
|
|
|
|
|
# Format input with step separator |
|
|
input_text = problem + "\n\n" + "\n\n".join(steps) |
|
|
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=8192) |
|
|
|
|
|
# Get model predictions |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
logits = outputs.logits # Shape: [batch_size, sequence_length, num_labels] |
|
|
probabilities = F.softmax(logits, dim=-1) # Convert to probabilities |
|
|
predictions = torch.argmax(logits, dim=-1) # Get predicted class indices |
|
|
|
|
|
# Aggregate predictions per step |
|
|
# In practice, you would map tokens to steps based on your step separator |
|
|
labels = ["error", "correct"] |
|
|
for i, step in enumerate(steps): |
|
|
# Get average probability for step tokens (simplified) |
|
|
# In real usage, you'd need to map token positions to step boundaries |
|
|
step_start = len(tokenizer(problem + "\n\n", return_tensors="pt")["input_ids"][0]) |
|
|
step_tokens = predictions[0, step_start:step_start+len(tokenizer(step)["input_ids"])] |
|
|
step_label = labels[step_tokens.mode().values.item()] if len(step_tokens) > 0 else "unknown" |
|
|
print(f"\nStep {i+1}: {step}") |
|
|
print(f" Prediction: {step_label}") |
|
|
print(f" Confidence: {probabilities[0, step_start, 1].item():.2%}") |
|
|
|
|
|
# Expected output: |
|
|
# Step 1: Subtract 5 from both sides: 2x = 8 |
|
|
# Prediction: correct |
|
|
# Confidence: 0.95 |
|
|
# |
|
|
# Step 2: Divide by 2: x = 5 |
|
|
# Prediction: error |
|
|
# Confidence: 0.87 |
|
|
``` |
|
|
|
|
|
**Output Interpretation:** |
|
|
|
|
|
- **Logits**: Raw scores from the model (before softmax). Higher values indicate stronger confidence. |
|
|
- **Probabilities**: Softmax-normalized scores between 0 and 1. Sum to 1 for each token. |
|
|
- **Predictions**: Class indices (0 = "error", 1 = "correct") for each token. |
|
|
|
|
|
#### Using with Pipeline |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
classifier = pipeline( |
|
|
"token-classification", |
|
|
model="path/to/Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM", |
|
|
tokenizer="path/to/Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM", |
|
|
device=0 if torch.cuda.is_available() else -1 |
|
|
) |
|
|
|
|
|
# Classify reasoning steps |
|
|
result = classifier(problem + "\n\n" + "\n\n".join(steps)) |
|
|
``` |
|
|
|
|
|
### Integration with Mathematical Reasoning |
|
|
|
|
|
This PRM model can be used to: |
|
|
1. **Filter incorrect reasoning paths** in tree-of-thought or chain-of-thought generation |
|
|
2. **Provide feedback** during step-by-step problem solving |
|
|
3. **Evaluate solution quality** before final answer generation |
|
|
4. **Improve training** by identifying problematic reasoning patterns |
|
|
|
|
|
## Training Procedure |
|
|
|
|
|
### Training Configuration |
|
|
|
|
|
- **Learning Rate**: 2e-5 |
|
|
- **Batch Size**: Per-device batch size (with gradient accumulation) |
|
|
- **Epochs**: Multiple epochs with early stopping |
|
|
- **Optimizer**: AdamW with cosine learning rate schedule |
|
|
- **Warmup Ratio**: 3% |
|
|
- **Gradient Clipping**: 5.0 |
|
|
- **Precision**: bfloat16 |
|
|
- **Gradient Checkpointing**: Enabled for memory efficiency |
|
|
|
|
|
### Training Framework Versions |
|
|
|
|
|
- **TRL**: 0.24.0 |
|
|
- **Transformers**: 4.56.2 |
|
|
- **PyTorch**: 2.9.1 |
|
|
- **Datasets**: 4.4.1 |
|
|
- **Tokenizers**: 0.22.1 |
|
|
|
|
|
### Training Data |
|
|
|
|
|
The model was trained on the **SHARP-Math** dataset, which contains: |
|
|
- Mathematical problems with step-by-step solutions |
|
|
- Labeled reasoning steps (correct/error) |
|
|
- Diverse mathematical domains and difficulty levels |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
### 1. Mathematical Reasoning Evaluation |
|
|
- Evaluate intermediate steps in mathematical problem-solving |
|
|
- Identify errors in multi-step calculations |
|
|
- Provide feedback on reasoning quality |
|
|
|
|
|
### 2. Educational Applications |
|
|
- Automated grading of mathematical solutions |
|
|
- Step-by-step feedback for students |
|
|
- Identification of common error patterns |
|
|
|
|
|
### 3. Research Applications |
|
|
- Training better mathematical reasoning models |
|
|
- Analyzing reasoning patterns |
|
|
- Improving chain-of-thought generation |
|
|
|
|
|
## Limitations and Considerations |
|
|
|
|
|
1. **Domain Specificity**: This model is specifically trained for mathematical reasoning and may not generalize well to other domains |
|
|
2. **Step Length**: The model is optimized for step-level evaluation with a 256-token context per step |
|
|
3. **Language**: The model is primarily trained on English mathematical content |
|
|
4. **False Positives/Negatives**: Like all classification models, it may misclassify some steps |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{qwen2.5-math-1.5b-instruct-sharp-math-prm, |
|
|
title={Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM: A Process Reward Model for Mathematical Reasoning}, |
|
|
author={Your Name/Organization}, |
|
|
year={2025}, |
|
|
howpublished={\url{https://huggingface.co/path/to/Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM}} |
|
|
} |
|
|
``` |
|
|
|
|
|
**Model Card Version**: 1.0 |
|
|
**Last Updated**: 2025-12-30 |
|
|
|
|
|
|