Convert model to bfloat16 and fix total_parameters metadata

bf07c49 verified 2 months ago

7.73 kB

	---
	base_model: Qwen/Qwen2.5-Math-7B-Instruct
	library_name: transformers
	model_name: Qwen2.5-Math-7B-Instruct-PRM800K-SHARP-PRM
	tags:
	- generated_from_trainer
	- prm
	- trl
	- math
	- process-reward-model
	- qwen2.5
	- sharp
	---

	# Model Card for Qwen2.5-Math-7B-Instruct-PRM800K-SHARP-PRM

	## Introduction

	Qwen2.5-Math-7B-Instruct-PRM800K-SHARP-PRM is a Process Reward Model (PRM) fine-tuned from [Qwen2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct). This model is specifically designed to evaluate the correctness of intermediate reasoning steps in mathematical problem-solving processes, enabling more reliable and interpretable mathematical reasoning.

	The model has been trained on the PRM800K dataset using the Process Reward Model methodology, which provides step-by-step feedback on mathematical reasoning chains.

	This model is part of the SHARP-PRM series, trained using advanced Process Reward Model techniques.

	## Model Information

	### Base Model
	- Base Model: [Qwen/Qwen2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct)
	- Architecture: Qwen2ForTokenClassification
	- Parameters: 7B

	### Training Details
	- Training Dataset: PRM800K (Process Reward Model dataset with 800K examples)
	- Training Method: Process Reward Model (PRM) as introduced in [Uesato et al., 2022](https://huggingface.co/papers/2211.14275)
	- Training Framework: [TRL (Transformer Reinforcement Learning)](https://github.com/huggingface/trl) v0.24.0
	- Task Type: Token Classification (binary classification: error/correct for each reasoning step)

	## PRM Evaluation

	This model is designed to evaluate mathematical reasoning processes by:
	1. Step-level Evaluation: Classifying each step in a reasoning chain as either "correct" or "error"
	2. Process Feedback: Providing feedback on the reasoning process, not just the final answer
	3. Error Detection: Identifying where mistakes occur in multi-step mathematical solutions

	### Evaluation Metrics
	The model is evaluated on the [ProcessBench](https://huggingface.co/datasets/Qwen/ProcessBench) benchmark.

	Key metrics include:
	- Error Accuracy: Ability to correctly identify incorrect steps
	- Correct Accuracy: Ability to correctly identify correct steps
	- F1 Score: Balanced measure of error and correct step classification

	## Quick Start

	### Installation

	```bash
	pip install transformers torch
	```

	### Basic Usage

	#### Using the Model for Step Classification

	```python
	from transformers import AutoModelForTokenClassification, AutoTokenizer
	import torch
	import torch.nn.functional as F

	model_name = "path/to/Qwen2.5-Math-7B-Instruct-PRM800K-SHARP-PRM"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForTokenClassification.from_pretrained(model_name)
	model.eval()

	# Example: Evaluate a mathematical reasoning chain
	# Problem with steps (one correct, one incorrect)
	problem = "Solve: 2x + 5 = 13"
	steps = [
	"Subtract 5 from both sides: 2x = 8", # Correct step
	"Divide by 2: x = 5" # Incorrect step (should be x = 4)
	]

	# Format input with step separator
	input_text = problem + "\n\n" + "\n\n".join(steps)
	inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=8192)

	# Get model predictions
	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits # Shape: [batch_size, sequence_length, num_labels]
	probabilities = F.softmax(logits, dim=-1) # Convert to probabilities
	predictions = torch.argmax(logits, dim=-1) # Get predicted class indices

	# Aggregate predictions per step
	# In practice, you would map tokens to steps based on your step separator
	labels = ["error", "correct"]
	for i, step in enumerate(steps):
	# Get average probability for step tokens (simplified)
	# In real usage, you'd need to map token positions to step boundaries
	step_start = len(tokenizer(problem + "\n\n", return_tensors="pt")["input_ids"][0])
	step_tokens = predictions[0, step_start:step_start+len(tokenizer(step)["input_ids"])]
	step_label = labels[step_tokens.mode().values.item()] if len(step_tokens) > 0 else "unknown"
	print(f"\nStep {i+1}: {step}")
	print(f" Prediction: {step_label}")
	print(f" Confidence: {probabilities[0, step_start, 1].item():.2%}")

	# Expected output:
	# Step 1: Subtract 5 from both sides: 2x = 8
	# Prediction: correct
	# Confidence: 0.95
	#
	# Step 2: Divide by 2: x = 5
	# Prediction: error
	# Confidence: 0.87
	```

	Output Interpretation:

	- Logits: Raw scores from the model (before softmax). Higher values indicate stronger confidence.
	- Probabilities: Softmax-normalized scores between 0 and 1. Sum to 1 for each token.
	- Predictions: Class indices (0 = "error", 1 = "correct") for each token.

	#### Using with Pipeline

	```python
	from transformers import pipeline

	classifier = pipeline(
	"token-classification",
	model="path/to/Qwen2.5-Math-7B-Instruct-PRM800K-SHARP-PRM",
	tokenizer="path/to/Qwen2.5-Math-7B-Instruct-PRM800K-SHARP-PRM",
	device=0 if torch.cuda.is_available() else -1
	)

	# Classify reasoning steps
	result = classifier(problem + "\n\n" + "\n\n".join(steps))
	```

	### Integration with Mathematical Reasoning

	This PRM model can be used to:
	1. Filter incorrect reasoning paths in tree-of-thought or chain-of-thought generation
	2. Provide feedback during step-by-step problem solving
	3. Evaluate solution quality before final answer generation
	4. Improve training by identifying problematic reasoning patterns

	## Training Procedure

	### Training Configuration

	- Learning Rate: 2e-5
	- Batch Size: Per-device batch size (with gradient accumulation)
	- Epochs: Multiple epochs with early stopping
	- Optimizer: AdamW with cosine learning rate schedule
	- Warmup Ratio: 3%
	- Gradient Clipping: 5.0
	- Precision: bfloat16
	- Gradient Checkpointing: Enabled for memory efficiency

	### Training Framework Versions

	- TRL: 0.24.0
	- Transformers: 4.56.2
	- PyTorch: 2.9.1
	- Datasets: 4.4.1
	- Tokenizers: 0.22.1

	### Training Data

	The model was trained on the PRM800K dataset, which contains:
	- Mathematical problems with step-by-step solutions
	- Labeled reasoning steps (correct/error)
	- Diverse mathematical domains and difficulty levels

	## Use Cases

	### 1. Mathematical Reasoning Evaluation
	- Evaluate intermediate steps in mathematical problem-solving
	- Identify errors in multi-step calculations
	- Provide feedback on reasoning quality

	### 2. Educational Applications
	- Automated grading of mathematical solutions
	- Step-by-step feedback for students
	- Identification of common error patterns

	### 3. Research Applications
	- Training better mathematical reasoning models
	- Analyzing reasoning patterns
	- Improving chain-of-thought generation

	## Limitations and Considerations

	1. Domain Specificity: This model is specifically trained for mathematical reasoning and may not generalize well to other domains
	2. Step Length: The model is optimized for step-level evaluation with a 256-token context per step
	3. Language: The model is primarily trained on English mathematical content
	4. False Positives/Negatives: Like all classification models, it may misclassify some steps

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{qwen2.5-math-7b-instruct-prm800k-sharp-prm,
	title={Qwen2.5-Math-7B-Instruct-PRM800K-SHARP-PRM: A Process Reward Model for Mathematical Reasoning},
	author={Your Name/Organization},
	year={2025},
	howpublished={\url{https://huggingface.co/path/to/Qwen2.5-Math-7B-Instruct-PRM800K-SHARP-PRM}}
	}
	```

	Model Card Version: 1.0
	Last Updated: 2025-12-30