metadata
license: other
license_name: qwen
license_link: https://huggingface.co/Qwen/Qwen2.5-Math-7B-PRM800K/blob/main/LICENSE
language:
- en
- zh
pipeline_tag: text-classification
library_name: transformers
tags:
- reward model
base_model:
- Qwen/Qwen2.5-Math-7B-PRM800K
PRM-Math-7B-Reasoner - Process Reward Model
PRM's : To identify and mitigate intermediate errors in the reasoning processes
PRM-Math-7B-Reasoner is a fully reproducible model, fine-tuned on the Qwen2.5-Math-7B-PRM800K dataset, designed to evaluate its ability to identify erroneous steps in mathematical reasoning. The model is used for reward computation, where after each step, a special token "" is inserted. For reward calculation, the probability score of this token being classified as positive is extracted, resulting in a reward value between 0 and 1. It is primarily utilized for solution reformatting in mathematically driven tasks and as a Long Context Full Reasoner.
PROCESSBENCH : PAPER
PROCESSBENCH: Identifying Process Errors in Mathematical Reasoning (arXiv) : https://arxiv.org/pdf/2412.06559
Reformatting Reasoning Intermediate
| Section | Content |
|---|---|
| Title | Example of Solution Reformatting |
| Description | This example demonstrates the reformatting of a solution for finding the foci of an ellipse. The original solution is generated by Qwen2-7B-Instruct, and the reformatted version is presented for clarity. |
| Problem Statement | The ellipse (\frac{(x-6)^{2}}{25}+\frac{(y-3)^{2}}{9}=1) has two foci. Find the one with the larger x-coordinate. Enter your answer as an ordered pair, like ((2,1)). |
| Original Solution | The original solution is provided in a less readable format, with some syntax errors and unclear notation. |
| Reformatted Solution | The reformatted solution clearly explains the steps: 1. Identify the center of the ellipse ((6,3)). 2. Calculate the distance from the center to each focus using (\sqrt{a^2 - b^2}). 3. Determine the foci locations at ((2,3)) and ((10,3)). 4. Identify the focus with the larger x-coordinate as ((10,3)). |