Windy0822
/

PQM

Windy0822 commited on Oct 11, 2024

Commit

ebe7812

verified ·

1 Parent(s): 0049462

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,16 +1,16 @@
----
-license: mit
-datasets:
-- peiyi9979/Math-Shepherd
-language:
-- en
-base_model:
-- deepseek-ai/deepseek-math-7b-base
-pipeline_tag: reinforcement-learning
----
 ## Introduction
 <div align="center">
-<img src="figures/PQM.png" width="822px">
 </div>
 We present a new framework for PRM by framing it as a $Q$-value ranking problem, providing a theoretical basis for reward modeling that captures inter-dependencies among reasoning states.

+---
+license: mit
+datasets:
+- peiyi9979/Math-Shepherd
+language:
+- en
+base_model:
+- deepseek-ai/deepseek-math-7b-base
+pipeline_tag: reinforcement-learning
+---
 ## Introduction
 <div align="center">
+<img src="PQM.png" width="822px">
 </div>
 We present a new framework for PRM by framing it as a $Q$-value ranking problem, providing a theoretical basis for reward modeling that captures inter-dependencies among reasoning states.