Update README.md
Browse files
README.md
CHANGED
|
@@ -10,7 +10,13 @@ language:
|
|
| 10 |
|
| 11 |
## Model Description
|
| 12 |
|
| 13 |
-
`ReasonEval-7B` is a 7.1B parameter decoder-only language model tuned from [`WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
- **Validity**: The step contains no mistakes in calculation and logic.
|
| 15 |
- **Redundancy**: The step lacks utility in solving the problem but is still valid.
|
| 16 |
|
|
@@ -31,51 +37,9 @@ possibilities of each class of reasong steps.
|
|
| 31 |
* **Paper**: [Evaluating Mathematical Reasoning Beyond Accuracy](https://drive.google.com/file/d/1Lw1uGFzTUWxo3mB91sfdusSrxnCCO9mR/view?usp=sharing)
|
| 32 |
* **Github**: [https://github.com/GAIR-NLP/ReasonEval](https://github.com/GAIR-NLP/ReasonEval)
|
| 33 |
* **Finetuned from model**: [`https://huggingface.co/WizardLM/WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1)
|
|
|
|
| 34 |
|
| 35 |
-
## Quick Start
|
| 36 |
-
### Setup
|
| 37 |
-
|
| 38 |
-
* Clone the repository
|
| 39 |
-
```bash
|
| 40 |
-
git clone https://github.com/GAIR-NLP/ReasonEval
|
| 41 |
-
cd ReasonEval
|
| 42 |
-
```
|
| 43 |
-
* Create a conda environment and activate the environment
|
| 44 |
-
```bash
|
| 45 |
-
conda create -n ReasonEval python=3.10
|
| 46 |
-
conda activate ReasonEval
|
| 47 |
-
```
|
| 48 |
-
* Install the required libraries
|
| 49 |
-
```bash
|
| 50 |
-
pip install -r requirements.txt
|
| 51 |
-
```
|
| 52 |
-
### Usage
|
| 53 |
-
|
| 54 |
-
Provide the question and the solution in a step-by-step format.
|
| 55 |
-
|
| 56 |
-
```python
|
| 57 |
-
# examples
|
| 58 |
-
question = "Let $x,$ $y,$ and $z$ be positive real numbers such that $xyz(x + y + z) = 1.$ Find the minimum value of\n\\[(x + y)(y + z).\\]"
|
| 59 |
-
reasoning_steps = ["1. The problem asks us to find the minimum value of $(x + y)(y + z)$ given that $x,$ $y,$ and $z$ are positive real numbers and $xyz(x + y + z) = 1$.",
|
| 60 |
-
"2. By the AM-GM inequality, we have $x + y + z \\geq 3\\sqrt[3]{xyz}$.",
|
| 61 |
-
"3. By the given condition $xyz(x + y + z) = 1$, we can substitute $x + y + z$ with $\\sqrt[3]{xyz}$ in the inequality from step 2 to get $3\\sqrt[3]{xyz} \\geq 3$.",
|
| 62 |
-
"4. Simplifying the inequality from step 3 gives $\\sqrt[3]{xyz} \\geq 1$.",
|
| 63 |
-
"5. By raising both sides of the inequality from step 4 to the power of 3, we have $xyz \\geq 1$.",
|
| 64 |
-
"6. By the AM-GM inequality, we have $(x + y)(y + z) \\geq 2\\sqrt{(x + y)(y + z)}$.",
|
| 65 |
-
"7. By the given condition $xyz(x + y + z) = 1$, we can substitute $(x + y)(y + z)$ with $\\frac{1}{xyz}$ in the inequality from step 6 to get $2\\sqrt{(x + y)(y + z)} \\geq 2\\sqrt{\\frac{1}{xyz}}$.",
|
| 66 |
-
"8. Simplifying the inequality from step 7 gives $(x + y)(y + z) \\geq \\frac{2}{\\sqrt{xyz}}$.",
|
| 67 |
-
"9. By the condition $xyz \\geq 1$ from step 5, we have $\\frac{2}{\\sqrt{xyz}} \\geq \\frac{2}{\\sqrt{1}} = 2$.",
|
| 68 |
-
"10. Therefore, the minimum value of $(x + y)(y + z)$ is $\\boxed{2}$."]
|
| 69 |
-
```
|
| 70 |
|
| 71 |
-
Run `./codes/examples.py` to get the validity and redundancy scores for each step.
|
| 72 |
-
```bash
|
| 73 |
-
# examples
|
| 74 |
-
## Replace the 'question' and 'reasoning_steps' in ./codes/examples.py with your own content.
|
| 75 |
-
python ./codes/examples.py
|
| 76 |
-
--model_name_or_path GAIR/ReasonEval-7B # Specify the model name or path here
|
| 77 |
-
--model_size 7B # Indicate the model size of ReasonEval (7B or 34B)
|
| 78 |
-
```
|
| 79 |
## How to Cite
|
| 80 |
```bibtex
|
| 81 |
```
|
|
|
|
| 10 |
|
| 11 |
## Model Description
|
| 12 |
|
| 13 |
+
`ReasonEval-7B` is a 7.1B parameter decoder-only language model tuned from [`WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1).
|
| 14 |
+
|
| 15 |
+
<p align="center">
|
| 16 |
+
<img src="introduction.jpg" alt="error" style="width:95%;">
|
| 17 |
+
</p>
|
| 18 |
+
|
| 19 |
+
`ReasonEval-7B` assesses the problem-solving process in a step-by-step format from the following perspectives:
|
| 20 |
- **Validity**: The step contains no mistakes in calculation and logic.
|
| 21 |
- **Redundancy**: The step lacks utility in solving the problem but is still valid.
|
| 22 |
|
|
|
|
| 37 |
* **Paper**: [Evaluating Mathematical Reasoning Beyond Accuracy](https://drive.google.com/file/d/1Lw1uGFzTUWxo3mB91sfdusSrxnCCO9mR/view?usp=sharing)
|
| 38 |
* **Github**: [https://github.com/GAIR-NLP/ReasonEval](https://github.com/GAIR-NLP/ReasonEval)
|
| 39 |
* **Finetuned from model**: [`https://huggingface.co/WizardLM/WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1)
|
| 40 |
+
* **Fine-tuning Data**: [`https://huggingface.co/WizardLM/WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1)
|
| 41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
## How to Cite
|
| 44 |
```bibtex
|
| 45 |
```
|