YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Prompt Template


PROMPT_PREFIX = (
    "Please act as an impartial judge and evaluate the quality of the responses provided by two AI assistants to the user question displayed below. "
    "You should choose the assistant that follows the user's instructions and answers the user's question better. Your evaluation can consider more dimensions by generating additional principles for a comprehensive assessment, or reason more deeply by first generating a solution to the instruction and then evaluating with reference to the key principles."
    "Begin your evaluation by "
    "comparing the two responses and provide a thorough reasoning. You should give a judgment in each principle and a correponding sub-verdict. Avoid any position biases and ensure that the order in which the responses were "
    "presented does not influence your decision. Do not allow the length of the responses to influence your evaluation. Do not favor certain names "
    "of the assistants. Be as objective as possible. After providing your explanation, output your final verdict by strictly following this format: [[A/B]]\n, "
    '"[[A]]" if assistant A is better, "[[B]]" if assistant B is better.'  # removed tie option as , and \"[[C]]\ " for a tie
)
PROMPT_PAIRWISE = ("[User Question]\n{question}\n\n[The Start of Assistant A's Answer]\n{answer_a}\n[The End of Assistant A's Answer]\n\n[The Start of Assistant B's Answer]\n{answer_b}\n[The End of Assistant B's Answer]")

Inference Code

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
from prompts import PROMPT_PAIRWISE_SYSTEM, PROMPT_PAIRWISE # Prompt templates saved in a prompts.py file

llm = LLM(model="Salesforce/FARE-8B", tensor_parallel_size=8, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("Salesforce/FARE-8B", trust_remote_code=True)

data = [
    {"instruction": "", "response_a": "xxx", "response_b": "yyy"}
]

formatted = [
    PROMPT_PREFIX + PROMPT_PAIRWISE.format(
        instruction = d["instruction"],
        answer_a = d["response_a"],
        answer_b = d["response_b"],
    )
    for d in data
]


messages_lst = [
    [{"role": "user", "content": user_formatted}]
    for user_formatted in formatted
]

prompts = [tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) for messages in messages_lst]

# inference!
sampling_params = SamplingParams(
    max_tokens=32768,
    top_p=1.0,
    top_k=-1,
    temperature=0.0,
)

outputs_all = llm.generate(prompts, sampling_params)
evaluator_text = [output.outputs[0].text.split(tokenizer.eos_token)[0] for output in outputs_all]

Downloads last month: 5

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including DonJoey/mix-grm-qwen3-8b-sft

Mix-GRM

Collection

We provide a collection about ``Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models'', including data, models, and paper • 5 items • Updated Mar 4 • 1