Academic-Rebuttal-Agent-Gemini (RebuttalGenie)

Model Type: Agentic LLM Pipeline (LangChain Wrapper utilizing Google Gemini 2.5 Flash)
Language(s): English (Academic/Formal)
License: Apache 2.0 (Wrapper Code) / Gemini API Terms of Service
Base Model: google/gemini-2.5-flash

Model Details

Description

The Academic-Rebuttal-Agent-Gemini (RebuttalGenie) is a specialized agent pipeline designed to assist researchers in drafting responses to peer-review comments. It ingests a manuscript's context and specific reviewer critiques, outputting highly structured, polite, and technically accurate rebuttals. Unlike generic chatbots, this agent employs Guardrail-Constrained Generation to prevent architectural hallucinations—a novel approach to maintaining scientific integrity in AI-assisted academic writing.

Architecture

The agent is built on a LangChain pipeline with the following components:

Knowledge Injection Layer: Hardcoded paper context (abstract, key findings, limitations)
Technical Guardrails: Architectural constraints (e.g., softmax layer dependencies, threshold logic)
Prompt Template: Few-Shot + Chain-of-Thought (CoT) structured reasoning
Inference Engine: Google Gemini 2.5 Flash via API

Intended Use

Primary Use Case

Automating the initial drafting of responses to academic peer-review comments. Designed for:

Graduate students drafting thesis defense responses
Conference authors managing multi-reviewer rebuttal deadlines
Researchers seeking standardized, polite academic tone in responses

Supported Response Types

Concession: Acknowledging limitations (e.g., small dataset, methodological oversights)
Defense: Justifying design choices with evidence from the manuscript
Structural Acceptance: Agreeing to formatting/readability improvements

Out-of-Scope Applications

Generating original research data or experimental results
Writing complete manuscripts or literature reviews
Making acceptance/rejection decisions on papers

Prompting Strategy & Prompt Engineering

Technique 1: Few-Shot Prompting

The agent receives exemplar rebuttals demonstrating correct tone and structure. For example:

Reviewer: "The dataset is too small." Draft: "We thank the reviewer for this observation. While our dataset is limited, we frame this as a proof-of-concept. We have updated Section 4 to reflect this limitation."

Technique 2: Chain-of-Thought (CoT) Reasoning

The system prompt enforces three explicit reasoning steps:

Critique Identification: Extract the core technical concern
Strategy Formulation: Determine concede vs. defend based on paper context
Drafting: Generate the final academic response

Technique 3: Technical Guardrail Prompting (Novel Contribution)

To prevent LLM hallucination in technical domains, we inject immutable architectural truths into the prompt context. For example:

"Strictly ensure that liveness threshold logic is described as occurring AFTER the softmax layer. Do not allow the AI to mention evaluating thresholds directly from raw logits."

This constraint prevented the agent from fabricating mathematically incorrect justifications—a common failure mode in unconstrained LLMs.

Training & Evaluation

Training Data

The agent itself is not fine-tuned. It relies on Gemini 2.5 Flash's pre-training and is constrained via prompt engineering.

Evaluation Metrics

Metric	Description	Result
Processing Success Rate	Percentage of reviewer comments successfully processed	100% (4/4)
Architectural Fidelity	Whether generated responses respected technical guardrails (e.g., softmax constraint)	100% (no hallucinated justifications)
Tone Appropriateness	Qualitative assessment of academic politeness	Consistent across all responses
Strategy Accuracy	Whether agent correctly chose concede vs. defend for each critique	100% (matched human judgment)

Test Case Performance

Reviewer Verdict	Critique Type	Agent Strategy	Guardrail Compliance
1: Weak Accept	Small dataset	Concede (proof-of-concept framing)	✅
0: Borderline	LSTM early stopping	Concede (exploratory finding)	✅
-3: Strong Reject	NUAA accuracy drop	Defend (deployment stability trade-off)	✅ (softmax enforced)
1: Weak Accept	Structure improvement	Accept (committed to revision)	✅

Limitations & Bias

Known Limitations

Context Dependency: Agent responses are only as accurate as the injected paper context. Incomplete or inaccurate context will produce poor rebuttals.
Over-Politeness Bias: The agent defaults to highly deferential academic tone. It may concede points that a human researcher would choose to defend more aggressively.
Single-Domain Testing: Currently evaluated only on face anti-spoofing research. Performance on other domains (NLP, theory, systems) is untested.
No Multi-Turn Dialogue: Agent handles one comment at a time. It cannot yet maintain context across multiple rounds of reviewer-author exchange.

Bias Considerations

The agent inherits biases present in Gemini 2.5 Flash's training data. Academic language generated may reflect Western academic conventions more strongly than other scholarly traditions.

API Usage

Provider: Google Gemini API
Model: gemini-2.5-flash (free tier)
Average Tokens per Request: ~500 input, ~300 output

Downloads last month: -; Downloads are not tracked for this model. How to track