prithivMLmods
/

PRM-Math-7B-Reasoner

Text Classification

text-generation

text-embeddings-inference

Model card Files Files and versions

prithivMLmods commited on Jan 19, 2025

Commit

4534e68

·

verified ·

1 Parent(s): a059d3f

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -11,4 +11,6 @@ tags:
 - reward model
 base_model:
 - Qwen/Qwen2.5-Math-7B-PRM800K
----

 - reward model
 base_model:
 - Qwen/Qwen2.5-Math-7B-PRM800K
+---
+PRM-Math-7B-Reasoner is a fully reproducible model, fine-tuned on the Qwen2.5-Math-7B-PRM800K dataset, designed to evaluate its ability to identify erroneous steps in mathematical reasoning. The model is used for reward computation, where after each step, a special token "<extra_0>" is inserted. For reward calculation, the probability score of this token being classified as positive is extracted, resulting in a reward value between 0 and 1. It is primarily utilized for solution reformatting in mathematically driven tasks and as a Long Context Full Reasoner.