GAIR
/

ReasonEval-7B

@@ -2,7 +2,6 @@
 license: apache-2.0
 language:
 - en
-library_name: transformers
 ---
@@ -11,16 +10,72 @@ library_name: transformers
 ## Model Description
-`ReasonEval-7B` is a 7.1B parameter decoder-only language model tuned from [`WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1).
 ## Model Details
-* **Model type**: `ReasonEval-7B` model is an auto-regressive language model based on the transformer decoder architecture.
 * **Language(s)**: English
 * **Paper**: [Evaluating Mathematical Reasoning Beyond Accuracy](https://drive.google.com/file/d/1Lw1uGFzTUWxo3mB91sfdusSrxnCCO9mR/view?usp=sharing)
 * **Finetuned from model**: [`https://huggingface.co/WizardLM/WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1)
 ## How to Cite
 ```bibtex
 ```

 license: apache-2.0
 language:
 - en
 ---
 ## Model Description
+`ReasonEval-7B` is a 7.1B parameter decoder-only language model tuned from [`WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1).  `ReasonEval-7B` assesses the problem-solving process in a step-by-step format from the following perspectives:
+- **Validity**: The step contains no mistakes in calculation and logic.
+- **Redundancy**: The step lacks utility in solving the problem but is still valid.
+With ReasonEval, you can
+- 📏 quantify the quality of reasoning steps free of human or close-source models.
+- 🤖 find the potential invalid or redundant steps in the solutions even with the correct results.
+- 🛠️ select high-quality training data for downstream tasks (e.g., fine-tuning).
 ## Model Details
+* **Model type**: `ReasonEval-7B` model is an auto-regressive language model based on the transformer decoder architecture. `ReasonEval-7B`’s architecture is identical to the base model, except that the
+classification head for next-token prediction is replaced with a classification head for outputting the
+possibilities of each class of reasong steps.
 * **Language(s)**: English
 * **Paper**: [Evaluating Mathematical Reasoning Beyond Accuracy](https://drive.google.com/file/d/1Lw1uGFzTUWxo3mB91sfdusSrxnCCO9mR/view?usp=sharing)
+* **Github**: [https://github.com/GAIR-NLP/ReasonEval](https://github.com/GAIR-NLP/ReasonEval)
 * **Finetuned from model**: [`https://huggingface.co/WizardLM/WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1)
+## Quick Start
+### Setup
+* Clone the repository
+```bash
+git clone https://github.com/GAIR-NLP/ReasonEval
+cd ReasonEval
+```
+* Create a conda environment and activate the environment
+```bash
+conda create -n ReasonEval python=3.10
+conda activate ReasonEval
+```
+* Install the required libraries
+```bash
+pip install -r requirements.txt
+```
+### Usage
+Provide the question and the solution in a step-by-step format.
+```python
+# examples
+question = "Let $x,$ $y,$ and $z$ be positive real numbers such that $xyz(x + y + z) = 1.$  Find the minimum value of\n\\[(x + y)(y + z).\\]"
+reasoning_steps = ["1. The problem asks us to find the minimum value of $(x + y)(y + z)$ given that $x,$ $y,$ and $z$ are positive real numbers and $xyz(x + y + z) = 1$.",
+"2. By the AM-GM inequality, we have $x + y + z \\geq 3\\sqrt[3]{xyz}$.",
+"3. By the given condition $xyz(x + y + z) = 1$, we can substitute $x + y + z$ with $\\sqrt[3]{xyz}$ in the inequality from step 2 to get $3\\sqrt[3]{xyz} \\geq 3$.",
+"4. Simplifying the inequality from step 3 gives $\\sqrt[3]{xyz} \\geq 1$.",
+"5. By raising both sides of the inequality from step 4 to the power of 3, we have $xyz \\geq 1$.",
+"6. By the AM-GM inequality, we have $(x + y)(y + z) \\geq 2\\sqrt{(x + y)(y + z)}$.",
+"7. By the given condition $xyz(x + y + z) = 1$, we can substitute $(x + y)(y + z)$ with $\\frac{1}{xyz}$ in the inequality from step 6 to get $2\\sqrt{(x + y)(y + z)} \\geq 2\\sqrt{\\frac{1}{xyz}}$.",
+"8. Simplifying the inequality from step 7 gives $(x + y)(y + z) \\geq \\frac{2}{\\sqrt{xyz}}$.",
+"9. By the condition $xyz \\geq 1$ from step 5, we have $\\frac{2}{\\sqrt{xyz}} \\geq \\frac{2}{\\sqrt{1}} = 2$.",
+"10. Therefore, the minimum value of $(x + y)(y + z)$ is $\\boxed{2}$."]
+```
+Run `./codes/examples.py` to get the validity and redundancy scores for each step.
+```bash
+# examples
+## Replace the 'question' and 'reasoning_steps' in ./codes/examples.py with your own content.
+python ./codes/examples.py
+--model_name_or_path GAIR/ReasonEval-7B # Specify the model name or path here
+--model_size 7B # Indicate the model size of ReasonEval (7B or 34B)
+```
 ## How to Cite
 ```bibtex
 ```