Mix-GRM
Collection
We provide a collection about ``Beyond Length Scaling: Synergizing Breadth and Depth for
Generative Reward Models'', including data, models, and paper • 5 items • Updated • 1
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Prompt Template
PROMPT_PREFIX = (
"Please act as an impartial judge and evaluate the quality of the responses provided by two AI assistants to the user question displayed below. "
"You should choose the assistant that follows the user's instructions and answers the user's question better. Your evaluation can consider more dimensions by generating additional principles for a comprehensive assessment, or reason more deeply by first generating a solution to the instruction and then evaluating with reference to the key principles."
"Begin your evaluation by "
"comparing the two responses and provide a thorough reasoning. You should give a judgment in each principle and a correponding sub-verdict. Avoid any position biases and ensure that the order in which the responses were "
"presented does not influence your decision. Do not allow the length of the responses to influence your evaluation. Do not favor certain names "
"of the assistants. Be as objective as possible. After providing your explanation, output your final verdict by strictly following this format: [[A/B]]\n, "
'"[[A]]" if assistant A is better, "[[B]]" if assistant B is better.' # removed tie option as , and \"[[C]]\ " for a tie
)
PROMPT_PAIRWISE = ("[User Question]\n{question}\n\n[The Start of Assistant A's Answer]\n{answer_a}\n[The End of Assistant A's Answer]\n\n[The Start of Assistant B's Answer]\n{answer_b}\n[The End of Assistant B's Answer]")
Inference Code
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
from prompts import PROMPT_PAIRWISE_SYSTEM, PROMPT_PAIRWISE # Prompt templates saved in a prompts.py file
llm = LLM(model="Salesforce/FARE-8B", tensor_parallel_size=8, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("Salesforce/FARE-8B", trust_remote_code=True)
data = [
{"instruction": "", "response_a": "xxx", "response_b": "yyy"}
]
formatted = [
PROMPT_PREFIX + PROMPT_PAIRWISE.format(
instruction = d["instruction"],
answer_a = d["response_a"],
answer_b = d["response_b"],
)
for d in data
]
messages_lst = [
[{"role": "user", "content": user_formatted}]
for user_formatted in formatted
]
prompts = [tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) for messages in messages_lst]
# inference!
sampling_params = SamplingParams(
max_tokens=32768,
top_p=1.0,
top_k=-1,
temperature=0.0,
)
outputs_all = llm.generate(prompts, sampling_params)
evaluator_text = [output.outputs[0].text.split(tokenizer.eos_token)[0] for output in outputs_all]