Academic-Rebuttal-Agent-Gemini (RebuttalGenie)
Model Type: Agentic LLM Pipeline (LangChain Wrapper utilizing Google Gemini 2.5 Flash)
Language(s): English (Academic/Formal)
License: Apache 2.0 (Wrapper Code) / Gemini API Terms of Service
Base Model: google/gemini-2.5-flash
Model Details
Description
The Academic-Rebuttal-Agent-Gemini (RebuttalGenie) is a specialized agent pipeline designed to assist researchers in drafting responses to peer-review comments. It ingests a manuscript's context and specific reviewer critiques, outputting highly structured, polite, and technically accurate rebuttals. Unlike generic chatbots, this agent employs Guardrail-Constrained Generation to prevent architectural hallucinations—a novel approach to maintaining scientific integrity in AI-assisted academic writing.
Architecture
The agent is built on a LangChain pipeline with the following components:
- Knowledge Injection Layer: Hardcoded paper context (abstract, key findings, limitations)
- Technical Guardrails: Architectural constraints (e.g., softmax layer dependencies, threshold logic)
- Prompt Template: Few-Shot + Chain-of-Thought (CoT) structured reasoning
- Inference Engine: Google Gemini 2.5 Flash via API
Intended Use
Primary Use Case
Automating the initial drafting of responses to academic peer-review comments. Designed for:
- Graduate students drafting thesis defense responses
- Conference authors managing multi-reviewer rebuttal deadlines
- Researchers seeking standardized, polite academic tone in responses
Supported Response Types
- Concession: Acknowledging limitations (e.g., small dataset, methodological oversights)
- Defense: Justifying design choices with evidence from the manuscript
- Structural Acceptance: Agreeing to formatting/readability improvements
Out-of-Scope Applications
- Generating original research data or experimental results
- Writing complete manuscripts or literature reviews
- Making acceptance/rejection decisions on papers
Prompting Strategy & Prompt Engineering
Technique 1: Few-Shot Prompting
The agent receives exemplar rebuttals demonstrating correct tone and structure. For example:
Reviewer: "The dataset is too small." Draft: "We thank the reviewer for this observation. While our dataset is limited, we frame this as a proof-of-concept. We have updated Section 4 to reflect this limitation."
Technique 2: Chain-of-Thought (CoT) Reasoning
The system prompt enforces three explicit reasoning steps:
- Critique Identification: Extract the core technical concern
- Strategy Formulation: Determine concede vs. defend based on paper context
- Drafting: Generate the final academic response
Technique 3: Technical Guardrail Prompting (Novel Contribution)
To prevent LLM hallucination in technical domains, we inject immutable architectural truths into the prompt context. For example:
"Strictly ensure that liveness threshold logic is described as occurring AFTER the softmax layer. Do not allow the AI to mention evaluating thresholds directly from raw logits."
This constraint prevented the agent from fabricating mathematically incorrect justifications—a common failure mode in unconstrained LLMs.
Training & Evaluation
Training Data
The agent itself is not fine-tuned. It relies on Gemini 2.5 Flash's pre-training and is constrained via prompt engineering.
Evaluation Metrics
| Metric | Description | Result |
|---|---|---|
| Processing Success Rate | Percentage of reviewer comments successfully processed | 100% (4/4) |
| Architectural Fidelity | Whether generated responses respected technical guardrails (e.g., softmax constraint) | 100% (no hallucinated justifications) |
| Tone Appropriateness | Qualitative assessment of academic politeness | Consistent across all responses |
| Strategy Accuracy | Whether agent correctly chose concede vs. defend for each critique | 100% (matched human judgment) |
Test Case Performance
| Reviewer Verdict | Critique Type | Agent Strategy | Guardrail Compliance |
|---|---|---|---|
| 1: Weak Accept | Small dataset | Concede (proof-of-concept framing) | ✅ |
| 0: Borderline | LSTM early stopping | Concede (exploratory finding) | ✅ |
| -3: Strong Reject | NUAA accuracy drop | Defend (deployment stability trade-off) | ✅ (softmax enforced) |
| 1: Weak Accept | Structure improvement | Accept (committed to revision) | ✅ |
Limitations & Bias
Known Limitations
- Context Dependency: Agent responses are only as accurate as the injected paper context. Incomplete or inaccurate context will produce poor rebuttals.
- Over-Politeness Bias: The agent defaults to highly deferential academic tone. It may concede points that a human researcher would choose to defend more aggressively.
- Single-Domain Testing: Currently evaluated only on face anti-spoofing research. Performance on other domains (NLP, theory, systems) is untested.
- No Multi-Turn Dialogue: Agent handles one comment at a time. It cannot yet maintain context across multiple rounds of reviewer-author exchange.
Bias Considerations
The agent inherits biases present in Gemini 2.5 Flash's training data. Academic language generated may reflect Western academic conventions more strongly than other scholarly traditions.
API Usage
- Provider: Google Gemini API
- Model:
gemini-2.5-flash(free tier) - Average Tokens per Request: ~500 input, ~300 output