seven-cat commited on
Commit
a1e3ec1
·
verified ·
1 Parent(s): a239a38

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -3
README.md CHANGED
@@ -2,7 +2,6 @@
2
  license: apache-2.0
3
  language:
4
  - en
5
- library_name: transformers
6
  ---
7
 
8
 
@@ -11,16 +10,72 @@ library_name: transformers
11
 
12
  ## Model Description
13
 
14
- `ReasonEval-7B` is a 7.1B parameter decoder-only language model tuned from [`WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1).
 
 
 
 
 
 
 
 
 
 
15
 
16
  ## Model Details
17
 
18
- * **Model type**: `ReasonEval-7B` model is an auto-regressive language model based on the transformer decoder architecture.
 
 
19
  * **Language(s)**: English
20
  * **Paper**: [Evaluating Mathematical Reasoning Beyond Accuracy](https://drive.google.com/file/d/1Lw1uGFzTUWxo3mB91sfdusSrxnCCO9mR/view?usp=sharing)
 
21
  * **Finetuned from model**: [`https://huggingface.co/WizardLM/WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1)
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
 
 
 
 
 
 
 
 
24
  ## How to Cite
25
  ```bibtex
26
  ```
 
2
  license: apache-2.0
3
  language:
4
  - en
 
5
  ---
6
 
7
 
 
10
 
11
  ## Model Description
12
 
13
+ `ReasonEval-7B` is a 7.1B parameter decoder-only language model tuned from [`WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1). `ReasonEval-7B` assesses the problem-solving process in a step-by-step format from the following perspectives:
14
+ - **Validity**: The step contains no mistakes in calculation and logic.
15
+ - **Redundancy**: The step lacks utility in solving the problem but is still valid.
16
+
17
+ With ReasonEval, you can
18
+
19
+ - 📏 quantify the quality of reasoning steps free of human or close-source models.
20
+
21
+ - 🤖 find the potential invalid or redundant steps in the solutions even with the correct results.
22
+
23
+ - 🛠️ select high-quality training data for downstream tasks (e.g., fine-tuning).
24
 
25
  ## Model Details
26
 
27
+ * **Model type**: `ReasonEval-7B` model is an auto-regressive language model based on the transformer decoder architecture. `ReasonEval-7B`’s architecture is identical to the base model, except that the
28
+ classification head for next-token prediction is replaced with a classification head for outputting the
29
+ possibilities of each class of reasong steps.
30
  * **Language(s)**: English
31
  * **Paper**: [Evaluating Mathematical Reasoning Beyond Accuracy](https://drive.google.com/file/d/1Lw1uGFzTUWxo3mB91sfdusSrxnCCO9mR/view?usp=sharing)
32
+ * **Github**: [https://github.com/GAIR-NLP/ReasonEval](https://github.com/GAIR-NLP/ReasonEval)
33
  * **Finetuned from model**: [`https://huggingface.co/WizardLM/WizardMath-7B-V1.1`](https://huggingface.co/WizardLM/WizardMath-7B-V1.1)
34
 
35
+ ## Quick Start
36
+ ### Setup
37
+
38
+ * Clone the repository
39
+ ```bash
40
+ git clone https://github.com/GAIR-NLP/ReasonEval
41
+ cd ReasonEval
42
+ ```
43
+ * Create a conda environment and activate the environment
44
+ ```bash
45
+ conda create -n ReasonEval python=3.10
46
+ conda activate ReasonEval
47
+ ```
48
+ * Install the required libraries
49
+ ```bash
50
+ pip install -r requirements.txt
51
+ ```
52
+ ### Usage
53
+
54
+ Provide the question and the solution in a step-by-step format.
55
+
56
+ ```python
57
+ # examples
58
+ question = "Let $x,$ $y,$ and $z$ be positive real numbers such that $xyz(x + y + z) = 1.$ Find the minimum value of\n\\[(x + y)(y + z).\\]"
59
+ reasoning_steps = ["1. The problem asks us to find the minimum value of $(x + y)(y + z)$ given that $x,$ $y,$ and $z$ are positive real numbers and $xyz(x + y + z) = 1$.",
60
+ "2. By the AM-GM inequality, we have $x + y + z \\geq 3\\sqrt[3]{xyz}$.",
61
+ "3. By the given condition $xyz(x + y + z) = 1$, we can substitute $x + y + z$ with $\\sqrt[3]{xyz}$ in the inequality from step 2 to get $3\\sqrt[3]{xyz} \\geq 3$.",
62
+ "4. Simplifying the inequality from step 3 gives $\\sqrt[3]{xyz} \\geq 1$.",
63
+ "5. By raising both sides of the inequality from step 4 to the power of 3, we have $xyz \\geq 1$.",
64
+ "6. By the AM-GM inequality, we have $(x + y)(y + z) \\geq 2\\sqrt{(x + y)(y + z)}$.",
65
+ "7. By the given condition $xyz(x + y + z) = 1$, we can substitute $(x + y)(y + z)$ with $\\frac{1}{xyz}$ in the inequality from step 6 to get $2\\sqrt{(x + y)(y + z)} \\geq 2\\sqrt{\\frac{1}{xyz}}$.",
66
+ "8. Simplifying the inequality from step 7 gives $(x + y)(y + z) \\geq \\frac{2}{\\sqrt{xyz}}$.",
67
+ "9. By the condition $xyz \\geq 1$ from step 5, we have $\\frac{2}{\\sqrt{xyz}} \\geq \\frac{2}{\\sqrt{1}} = 2$.",
68
+ "10. Therefore, the minimum value of $(x + y)(y + z)$ is $\\boxed{2}$."]
69
+ ```
70
 
71
+ Run `./codes/examples.py` to get the validity and redundancy scores for each step.
72
+ ```bash
73
+ # examples
74
+ ## Replace the 'question' and 'reasoning_steps' in ./codes/examples.py with your own content.
75
+ python ./codes/examples.py
76
+ --model_name_or_path GAIR/ReasonEval-7B # Specify the model name or path here
77
+ --model_size 7B # Indicate the model size of ReasonEval (7B or 34B)
78
+ ```
79
  ## How to Cite
80
  ```bibtex
81
  ```