Update README.md
Browse files
README.md
CHANGED
|
@@ -2,7 +2,6 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
-
library_name: transformers
|
| 6 |
---
|
| 7 |
|
| 8 |
|
|
@@ -11,16 +10,72 @@ library_name: transformers
|
|
| 11 |
|
| 12 |
## Model Description
|
| 13 |
|
| 14 |
-
`ReasonEval-7B` is a 7.1B parameter decoder-only language model tuned from [`WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
## Model Details
|
| 17 |
|
| 18 |
-
* **Model type**: `ReasonEval-7B` model is an auto-regressive language model based on the transformer decoder architecture.
|
|
|
|
|
|
|
| 19 |
* **Language(s)**: English
|
| 20 |
* **Paper**: [Evaluating Mathematical Reasoning Beyond Accuracy](https://drive.google.com/file/d/1Lw1uGFzTUWxo3mB91sfdusSrxnCCO9mR/view?usp=sharing)
|
|
|
|
| 21 |
* **Finetuned from model**: [`https://huggingface.co/WizardLM/WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1)
|
| 22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
## How to Cite
|
| 25 |
```bibtex
|
| 26 |
```
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- en
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
|
|
|
|
| 10 |
|
| 11 |
## Model Description
|
| 12 |
|
| 13 |
+
`ReasonEval-7B` is a 7.1B parameter decoder-only language model tuned from [`WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1). `ReasonEval-7B` assesses the problem-solving process in a step-by-step format from the following perspectives:
|
| 14 |
+
- **Validity**: The step contains no mistakes in calculation and logic.
|
| 15 |
+
- **Redundancy**: The step lacks utility in solving the problem but is still valid.
|
| 16 |
+
|
| 17 |
+
With ReasonEval, you can
|
| 18 |
+
|
| 19 |
+
- 📏 quantify the quality of reasoning steps free of human or close-source models.
|
| 20 |
+
|
| 21 |
+
- 🤖 find the potential invalid or redundant steps in the solutions even with the correct results.
|
| 22 |
+
|
| 23 |
+
- 🛠️ select high-quality training data for downstream tasks (e.g., fine-tuning).
|
| 24 |
|
| 25 |
## Model Details
|
| 26 |
|
| 27 |
+
* **Model type**: `ReasonEval-7B` model is an auto-regressive language model based on the transformer decoder architecture. `ReasonEval-7B`’s architecture is identical to the base model, except that the
|
| 28 |
+
classification head for next-token prediction is replaced with a classification head for outputting the
|
| 29 |
+
possibilities of each class of reasong steps.
|
| 30 |
* **Language(s)**: English
|
| 31 |
* **Paper**: [Evaluating Mathematical Reasoning Beyond Accuracy](https://drive.google.com/file/d/1Lw1uGFzTUWxo3mB91sfdusSrxnCCO9mR/view?usp=sharing)
|
| 32 |
+
* **Github**: [https://github.com/GAIR-NLP/ReasonEval](https://github.com/GAIR-NLP/ReasonEval)
|
| 33 |
* **Finetuned from model**: [`https://huggingface.co/WizardLM/WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1)
|
| 34 |
|
| 35 |
+
## Quick Start
|
| 36 |
+
### Setup
|
| 37 |
+
|
| 38 |
+
* Clone the repository
|
| 39 |
+
```bash
|
| 40 |
+
git clone https://github.com/GAIR-NLP/ReasonEval
|
| 41 |
+
cd ReasonEval
|
| 42 |
+
```
|
| 43 |
+
* Create a conda environment and activate the environment
|
| 44 |
+
```bash
|
| 45 |
+
conda create -n ReasonEval python=3.10
|
| 46 |
+
conda activate ReasonEval
|
| 47 |
+
```
|
| 48 |
+
* Install the required libraries
|
| 49 |
+
```bash
|
| 50 |
+
pip install -r requirements.txt
|
| 51 |
+
```
|
| 52 |
+
### Usage
|
| 53 |
+
|
| 54 |
+
Provide the question and the solution in a step-by-step format.
|
| 55 |
+
|
| 56 |
+
```python
|
| 57 |
+
# examples
|
| 58 |
+
question = "Let $x,$ $y,$ and $z$ be positive real numbers such that $xyz(x + y + z) = 1.$ Find the minimum value of\n\\[(x + y)(y + z).\\]"
|
| 59 |
+
reasoning_steps = ["1. The problem asks us to find the minimum value of $(x + y)(y + z)$ given that $x,$ $y,$ and $z$ are positive real numbers and $xyz(x + y + z) = 1$.",
|
| 60 |
+
"2. By the AM-GM inequality, we have $x + y + z \\geq 3\\sqrt[3]{xyz}$.",
|
| 61 |
+
"3. By the given condition $xyz(x + y + z) = 1$, we can substitute $x + y + z$ with $\\sqrt[3]{xyz}$ in the inequality from step 2 to get $3\\sqrt[3]{xyz} \\geq 3$.",
|
| 62 |
+
"4. Simplifying the inequality from step 3 gives $\\sqrt[3]{xyz} \\geq 1$.",
|
| 63 |
+
"5. By raising both sides of the inequality from step 4 to the power of 3, we have $xyz \\geq 1$.",
|
| 64 |
+
"6. By the AM-GM inequality, we have $(x + y)(y + z) \\geq 2\\sqrt{(x + y)(y + z)}$.",
|
| 65 |
+
"7. By the given condition $xyz(x + y + z) = 1$, we can substitute $(x + y)(y + z)$ with $\\frac{1}{xyz}$ in the inequality from step 6 to get $2\\sqrt{(x + y)(y + z)} \\geq 2\\sqrt{\\frac{1}{xyz}}$.",
|
| 66 |
+
"8. Simplifying the inequality from step 7 gives $(x + y)(y + z) \\geq \\frac{2}{\\sqrt{xyz}}$.",
|
| 67 |
+
"9. By the condition $xyz \\geq 1$ from step 5, we have $\\frac{2}{\\sqrt{xyz}} \\geq \\frac{2}{\\sqrt{1}} = 2$.",
|
| 68 |
+
"10. Therefore, the minimum value of $(x + y)(y + z)$ is $\\boxed{2}$."]
|
| 69 |
+
```
|
| 70 |
|
| 71 |
+
Run `./codes/examples.py` to get the validity and redundancy scores for each step.
|
| 72 |
+
```bash
|
| 73 |
+
# examples
|
| 74 |
+
## Replace the 'question' and 'reasoning_steps' in ./codes/examples.py with your own content.
|
| 75 |
+
python ./codes/examples.py
|
| 76 |
+
--model_name_or_path GAIR/ReasonEval-7B # Specify the model name or path here
|
| 77 |
+
--model_size 7B # Indicate the model size of ReasonEval (7B or 34B)
|
| 78 |
+
```
|
| 79 |
## How to Cite
|
| 80 |
```bibtex
|
| 81 |
```
|