Rebuttal-RM 🏅

1. Introduction

Rebuttal-RM is a scoring model trained to automatically assess author responses in light of the target comment and its supporting context, with the explicit goal of matching human‐reviewer preferences. The reward model, denoted GRM, receives as input the retrieved evidence chunks CE, the current review R_i, the target comment target, and a candidate reply response; it returns a vector of rubric-aligned scores s together with an explanatory rationale e. Formally,

$(s,e)\;=\;\mathrm{GRM}\!\bigl(\,\sum_{p_j\in\text{CE}} p_j,\; R_i,\; c_{\text{target}},\; r_{\text{response}}\bigr).$

To obtain a robust evaluator we curate 102 K training instances drawn from three sources: (i) 12 000 original author rebuttals that serve as a human baseline, (ii) GPT-4.1–refined answers representing an upper quality bound, and (iii) diverse model-generated replies (e.g., Qwen-2.5-3B, Claude-3.5) to broaden stylistic coverage. Using Qwen-3-8B as the backbone, we fine-tune on this corpus to yield the final Rebuttal-RM. As reported in Table 1, Rebuttal-RM achieves the strongest agreement with expert annotators, posting an average correlation of 0.812 and outperforming GPT-4.1 and DeepSeek-R1 by 9.0 % and 15.2 %, respectively.

2. Performance (agreement with human ratings)(higher = better)

Scoring Model	Attitude r	Attitude β	Attitude f	Clarity r	Clarity β	Clarity f	Persuasiveness r	Persuasiveness β	Persuasiveness f	Constructiveness r	Constructiveness β	Constructiveness f	Avg
Qwen-3-8B	0.718	0.672	0.620	0.609	0.568	0.710	0.622	0.577	0.690	0.718	0.745	0.720	0.664
Llama-3-8B	0.297	0.347	0.540	0.158	0.047	0.380	0.272	0.245	0.560	0.424	0.457	0.460	0.349
GLM-4-9B	0.420	0.475	0.460	0.467	0.436	0.730	0.369	0.361	0.700	0.561	0.519	0.570	0.506
GPT-4.1	0.743	0.712	0.800	0.739	0.671	0.750	0.779	0.763	0.740	0.804	0.756	0.680	0.745
DeepSeek-R1	0.646	0.633	0.790	0.708	0.615	0.760	0.710	0.664	0.720	0.742	0.701	0.620	0.705
DeepSeek-V3	0.699	0.733	0.710	0.687	0.578	0.740	0.697	0.652	0.770	0.771	0.719	0.750	0.692
Gemini-2.5	0.620	0.509	0.750	0.605	0.593	0.540	0.627	0.607	0.520	0.711	0.705	0.610	0.616
Claude-3.5	0.569	0.635	0.720	0.704	0.670	0.680	0.706	0.686	0.670	0.753	0.738	0.630	0.680
Rebuttal-RM	0.839	0.828	0.910	0.753	0.677	0.790	0.821	0.801	0.820	0.839	0.835	0.810	0.812

3. Deployment / Usage

3.1 Run with vLLM (OpenAI protocol)

pip install "vllm>=0.4.2"
python -m vllm.entrypoints.openai.api_server \
  --model Zhitao-He/Rebuttal-RM \
  --dtype auto \
  --port 8000

3.2 Query via the official openai SDK

python

import openai
openai.api_key  = "EMPTY"
openai.base_url = "http://localhost:8000/v1"

prompt = f"""
The whole review content is{Full_Review_Content}
The target comment content is{Target_Comment}.
The best relevant paper fragment is{Relevant_Paper_Fragment}.
The response is{response}
"""

resp = openai.chat.completions.create(
    model="Zhitao-He/Rebuttal-RM",
    messages=[
        {"role": "system", "content": system_prompt}, 
        {"role": "user",   "content": user_prompt}
    ],
    temperature=0.7
)
print(resp.choices[0].message.content）

3.3 Default System Prompt (system_prompt)

You are a seasoned academic reviewer and response optimization expert. Your task is to evaluate the quality of the response based on the review comments, paper fragments, and the authors' responses. Please strictly follow the requirements below, and output only the score and score explanation.

Input variables:

review_content: Complete content of the review comments. similar_paper_fragments: Best paper fragment most relevant to the comment. comment: Specific segment of the review comments. original_response: The authors' original response text to the comment.

Your task: Based on the input information, output only a JSON object containing the following two items:
Scoring Standard
Score Range: 0 - 10
0: Wholly Ineffective
1-2: Perfunctory
3-4: Unconvincing
5-6: Addresses Some Concerns
7-8: Exceptional
9-10: Outstanding
score: Four-dimensional score breakdown, ranging from 0-10, structured as follows:
Attitude: The tone and professionalism of the response. 
Clarity: The logic, structure, and focus of the response.
Persuasiveness: The effectiveness of the argumentation and evidence support. 
Constructiveness: The commitment to revisions and specific actions taken.

score_explanation: A brief explanation of each score, specifically citing key points from the response text that reflect the scores and any shortcomings.

Output requirements:

Only output the JSON object; do not include any other characters or explanations.
The scoring must be reasonable, and the score explanation must clearly reference the original text that reflects the score. 
All output must be in formal, polite academic English.
Your output must be strictly JSON formal which can be dirctly loaded by json.load()
Output format example:
{ "score": { "Attitude": <int>,
              "Clarity": <int>, 
              "Persuasiveness": <int>,
              "Constructiveness": <int> }, 
  "score_explanation": <explanation for your given score>}"""

4. Citation

@misc{he2026dancingchainsstrategicpersuasion,
      title={Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind}, 
      author={Zhitao He and Zongwei Lyu and Yi R Fung},
      year={2026},
      eprint={2601.15715},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.15715}, 
}

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for RebuttalAgent/Rebuttal-RM

Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Paper • 2601.15715 • Published Jan 22 • 14