nishadsinghi commited on
Commit
c305e53
·
verified ·
1 Parent(s): a26881b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -7,16 +7,34 @@ sdk: static
7
  pinned: false
8
  ---
9
 
 
 
 
 
 
 
 
10
  # MATH Dataset
11
 
 
 
 
 
12
  ## Training data for GenRM-FT
13
  - Llama-3.1-8B-Instruct: https://huggingface.co/datasets/sc-genrm-scaling/genrm_gpt4o_verifs_llama_3p1_8b_solns_math_train
14
  - Qwen-2.5.-7B-Instruct: https://huggingface.co/datasets/sc-genrm-scaling/genrm_gpt4o_verifs_qwen_2p5_7b_solns_math_train
15
 
 
 
16
  ## Finetuned Verifiers:
17
  - Llama-3.1-8B-Instruct: https://huggingface.co/sc-genrm-scaling/llama_3.1_8b_genrm_ft
18
  - Qwen-2.5.-7B-Instruct: https://huggingface.co/sc-genrm-scaling/qwen_2.5_7b_genrm_ft
19
 
 
 
 
 
 
20
  ## Solutions and Verifications for Test-set
21
  - Llama-3.1-8B-Instruct:
22
  - Solutions: https://huggingface.co/datasets/sc-genrm-scaling/MATH128_Solutions_Llama-3.1-8B-Instruct
 
7
  pinned: false
8
  ---
9
 
10
+ Data and models accompanying the paper [When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning](https://arxiv.org/abs/2504.01005), containing:
11
+
12
+ - Finetuned generative verifiers (i.e., GenRM-FT) for math reasoning.
13
+ - Synthetic verification data generated by GPT-4o for math reasoning to train your own generative verifiers.
14
+ - Solutions and verifications generated by various models for math and science reasoning.
15
+
16
+
17
  # MATH Dataset
18
 
19
+ We use Llama-3.1-8B-Instruct and Qwen-2.5-7B-Instruct to generate solutions for problems in the training split of the [MATH dataset](https://huggingface.co/datasets/hendrycks/competition_math).
20
+ Then, we use GPT-4o to verify these solutions. We filter out the verifications whose verdict doesn't match the ground-truth correctness of the solution, and balance the dataset to have equal 'yes' and 'no' verifications in the dataset.
21
+ This results in these datasets:
22
+
23
  ## Training data for GenRM-FT
24
  - Llama-3.1-8B-Instruct: https://huggingface.co/datasets/sc-genrm-scaling/genrm_gpt4o_verifs_llama_3p1_8b_solns_math_train
25
  - Qwen-2.5.-7B-Instruct: https://huggingface.co/datasets/sc-genrm-scaling/genrm_gpt4o_verifs_qwen_2p5_7b_solns_math_train
26
 
27
+ We fine-tune the two models on their respective datasets using LoRA, resulting in these fine-tuned GenRMs:
28
+
29
  ## Finetuned Verifiers:
30
  - Llama-3.1-8B-Instruct: https://huggingface.co/sc-genrm-scaling/llama_3.1_8b_genrm_ft
31
  - Qwen-2.5.-7B-Instruct: https://huggingface.co/sc-genrm-scaling/qwen_2.5_7b_genrm_ft
32
 
33
+ You can follow [this example](https://github.com/nishadsinghi/sc-genrm-scaling/blob/master/llmonk/verify/demo.ipynb) of how to do inference with these models.
34
+
35
+
36
+ We use these generative verifiers (without fine-tuning in the case of Llama-3.3-70B-Instruct) on solutions from the MATH test set to obtain this data, which we analyse in the paper:
37
+
38
  ## Solutions and Verifications for Test-set
39
  - Llama-3.1-8B-Instruct:
40
  - Solutions: https://huggingface.co/datasets/sc-genrm-scaling/MATH128_Solutions_Llama-3.1-8B-Instruct