Discriminator-Guided Multi-step Reasoning with Language Models
Paper
•
2305.14934
•
Published
•
1
This model is part of the work presented in the paper GRACE: Discriminator-Guided Chain-of-Thought Reasoning.
GRACE (Guiding chain-of-thought ReAsoning with a CorrectnEss Discriminator) is a stepwise decoding approach that steers the decoding process towards producing correct reasoning steps. It employs a step-level verifier or discriminator trained with a contrastive loss over correct and incorrect steps, which is used during decoding to score next-step candidates based on their correctness.
The official implementation for running guided decoding using this model can be found in the GitHub repository. Below is an example of how to run the GRACE decoding:
WANDB_MODE=disabled python run_grace.py \
--model_name_or_path mkhalifa/flan-t5-large-gsm8k \
--in_file data/gsm8k/dev.jsonl \
--task gsm8k \
--disc_path ckpts/discrim/flan-t5-gsm8k/ \
--beta 0.1 --n_candidate_steps 20 --generation_type step-score \
--step_sampling_method top_p --device2 cuda:0 --top_p .95 --sample_calc true \
--max_steps 6 --max_step_length 60 --step_delimiter '|' --temperature .8 --n_self_consistency 1 --seed 42
If you use this work, please cite the following paper:
@article{khalifa2023grace,
title={Grace: Discriminator-guided chain-of-thought reasoning},
author={Khalifa, Muhammad and Logeswaran, Lajanugen and Lee, Moontae and Lee, Honglak and Wang, Lu},
journal={arXiv preprint arXiv:2305.14934},
year={2023}
}