| --- |
| base_model: Qwen/Qwen2.5-Math-1.5B-Instruct |
| library_name: transformers |
| model_name: Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM |
| tags: |
| - generated_from_trainer |
| - prm |
| - trl |
| - math |
| - process-reward-model |
| - qwen2.5 |
| - sharp |
| --- |
| |
| # Model Card for Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM |
|
|
| ## Introduction |
|
|
| **Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM** is a Process Reward Model (PRM) fine-tuned from [Qwen2.5-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct). This model is specifically designed to evaluate the correctness of intermediate reasoning steps in mathematical problem-solving processes, enabling more reliable and interpretable mathematical reasoning. |
|
|
| The model has been trained on the **SHARP-Math** dataset using the Process Reward Model methodology, which provides step-by-step feedback on mathematical reasoning chains. |
|
|
| This model is part of the SHARP-PRM series, trained using advanced Process Reward Model techniques. |
|
|
| ## Model Information |
|
|
| ### Base Model |
| - **Base Model**: [Qwen/Qwen2.5-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct) |
| - **Architecture**: Qwen2ForTokenClassification |
| - **Parameters**: 1.5B |
|
|
| ### Training Details |
| - **Training Dataset**: SHARP-Math (Process Reward Model dataset) |
| - **Training Method**: Process Reward Model (PRM) as introduced in [Uesato et al., 2022](https://huggingface.co/papers/2211.14275) |
| - **Training Framework**: [TRL (Transformer Reinforcement Learning)](https://github.com/huggingface/trl) v0.24.0 |
| - **Task Type**: Token Classification (binary classification: error/correct for each reasoning step) |
|
|
| ## PRM Evaluation |
|
|
| This model is designed to evaluate mathematical reasoning processes by: |
| 1. **Step-level Evaluation**: Classifying each step in a reasoning chain as either "correct" or "error" |
| 2. **Process Feedback**: Providing feedback on the reasoning process, not just the final answer |
| 3. **Error Detection**: Identifying where mistakes occur in multi-step mathematical solutions |
|
|
| ### Evaluation Metrics |
| The model is evaluated on the [ProcessBench](https://huggingface.co/datasets/Qwen/ProcessBench) benchmark. |
|
|
| Key metrics include: |
| - **Error Accuracy**: Ability to correctly identify incorrect steps |
| - **Correct Accuracy**: Ability to correctly identify correct steps |
| - **F1 Score**: Balanced measure of error and correct step classification |
|
|
| ## Quick Start |
|
|
| ### Installation |
|
|
| ```bash |
| pip install transformers torch |
| ``` |
|
|
| ### Basic Usage |
|
|
| #### Using the Model for Step Classification |
|
|
| ```python |
| from transformers import AutoModelForTokenClassification, AutoTokenizer |
| import torch |
| import torch.nn.functional as F |
| |
| model_name = "path/to/Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM" |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForTokenClassification.from_pretrained(model_name) |
| model.eval() |
| |
| # Example: Evaluate a mathematical reasoning chain |
| # Problem with steps (one correct, one incorrect) |
| problem = "Solve: 2x + 5 = 13" |
| steps = [ |
| "Subtract 5 from both sides: 2x = 8", # Correct step |
| "Divide by 2: x = 5" # Incorrect step (should be x = 4) |
| ] |
| |
| # Format input with step separator |
| input_text = problem + "\n\n" + "\n\n".join(steps) |
| inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=8192) |
| |
| # Get model predictions |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| logits = outputs.logits # Shape: [batch_size, sequence_length, num_labels] |
| probabilities = F.softmax(logits, dim=-1) # Convert to probabilities |
| predictions = torch.argmax(logits, dim=-1) # Get predicted class indices |
| |
| # Aggregate predictions per step |
| # In practice, you would map tokens to steps based on your step separator |
| labels = ["error", "correct"] |
| for i, step in enumerate(steps): |
| # Get average probability for step tokens (simplified) |
| # In real usage, you'd need to map token positions to step boundaries |
| step_start = len(tokenizer(problem + "\n\n", return_tensors="pt")["input_ids"][0]) |
| step_tokens = predictions[0, step_start:step_start+len(tokenizer(step)["input_ids"])] |
| step_label = labels[step_tokens.mode().values.item()] if len(step_tokens) > 0 else "unknown" |
| print(f"\nStep {i+1}: {step}") |
| print(f" Prediction: {step_label}") |
| print(f" Confidence: {probabilities[0, step_start, 1].item():.2%}") |
| |
| # Expected output: |
| # Step 1: Subtract 5 from both sides: 2x = 8 |
| # Prediction: correct |
| # Confidence: 0.95 |
| # |
| # Step 2: Divide by 2: x = 5 |
| # Prediction: error |
| # Confidence: 0.87 |
| ``` |
|
|
| **Output Interpretation:** |
|
|
| - **Logits**: Raw scores from the model (before softmax). Higher values indicate stronger confidence. |
| - **Probabilities**: Softmax-normalized scores between 0 and 1. Sum to 1 for each token. |
| - **Predictions**: Class indices (0 = "error", 1 = "correct") for each token. |
|
|
| #### Using with Pipeline |
|
|
| ```python |
| from transformers import pipeline |
| |
| classifier = pipeline( |
| "token-classification", |
| model="path/to/Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM", |
| tokenizer="path/to/Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM", |
| device=0 if torch.cuda.is_available() else -1 |
| ) |
| |
| # Classify reasoning steps |
| result = classifier(problem + "\n\n" + "\n\n".join(steps)) |
| ``` |
|
|
| ### Integration with Mathematical Reasoning |
|
|
| This PRM model can be used to: |
| 1. **Filter incorrect reasoning paths** in tree-of-thought or chain-of-thought generation |
| 2. **Provide feedback** during step-by-step problem solving |
| 3. **Evaluate solution quality** before final answer generation |
| 4. **Improve training** by identifying problematic reasoning patterns |
|
|
| ## Training Procedure |
|
|
| ### Training Configuration |
|
|
| - **Learning Rate**: 2e-5 |
| - **Batch Size**: Per-device batch size (with gradient accumulation) |
| - **Epochs**: Multiple epochs with early stopping |
| - **Optimizer**: AdamW with cosine learning rate schedule |
| - **Warmup Ratio**: 3% |
| - **Gradient Clipping**: 5.0 |
| - **Precision**: bfloat16 |
| - **Gradient Checkpointing**: Enabled for memory efficiency |
|
|
| ### Training Framework Versions |
|
|
| - **TRL**: 0.24.0 |
| - **Transformers**: 4.56.2 |
| - **PyTorch**: 2.9.1 |
| - **Datasets**: 4.4.1 |
| - **Tokenizers**: 0.22.1 |
|
|
| ### Training Data |
|
|
| The model was trained on the **SHARP-Math** dataset, which contains: |
| - Mathematical problems with step-by-step solutions |
| - Labeled reasoning steps (correct/error) |
| - Diverse mathematical domains and difficulty levels |
|
|
| ## Use Cases |
|
|
| ### 1. Mathematical Reasoning Evaluation |
| - Evaluate intermediate steps in mathematical problem-solving |
| - Identify errors in multi-step calculations |
| - Provide feedback on reasoning quality |
|
|
| ### 2. Educational Applications |
| - Automated grading of mathematical solutions |
| - Step-by-step feedback for students |
| - Identification of common error patterns |
|
|
| ### 3. Research Applications |
| - Training better mathematical reasoning models |
| - Analyzing reasoning patterns |
| - Improving chain-of-thought generation |
|
|
| ## Limitations and Considerations |
|
|
| 1. **Domain Specificity**: This model is specifically trained for mathematical reasoning and may not generalize well to other domains |
| 2. **Step Length**: The model is optimized for step-level evaluation with a 256-token context per step |
| 3. **Language**: The model is primarily trained on English mathematical content |
| 4. **False Positives/Negatives**: Like all classification models, it may misclassify some steps |
|
|
| ## Citation |
|
|
| If you use this model in your research, please cite: |
|
|
| ```bibtex |
| @misc{qwen2.5-math-1.5b-instruct-sharp-math-prm, |
| title={Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM: A Process Reward Model for Mathematical Reasoning}, |
| author={Your Name/Organization}, |
| year={2025}, |
| howpublished={\url{https://huggingface.co/path/to/Qwen2.5-Math-1.5B-Instruct-SHARP-Math-PRM}} |
| } |
| ``` |
|
|
| **Model Card Version**: 1.0 |
| **Last Updated**: 2025-12-30 |
|
|
|
|