A newer version of the Gradio SDK is available:
6.9.0
AI Research Paper Analyst β Complete Project Documentation
Table of Contents
- Project Overview
- System Architecture Flowchart
- Pipeline Flow
- Agents
- Tools
- Pydantic Schemas
- Gradio UI
- Safety & Guardrails
- Tech Stack & Dependencies
- Project Structure
- How to Run
1. Project Overview
AI Research Paper Analyst is an automated peer-review system powered by a multi-agent AI pipeline. A user uploads a research paper (PDF), and the system produces a comprehensive, publication-ready peer review β including methodology critique, novelty assessment, rubric scoring, and a final Accept/Revise/Reject recommendation.
| Property | Value |
|---|---|
| Framework | CrewAI (multi-agent orchestration) |
| LLM Backend | OpenAI GPT-4o (extraction) + GPT-4o-mini (all other agents) |
| Frontend | Gradio 5.x |
| Safety | Programmatic (regex/logic-based) β no LLM in the safety gate |
| Output Format | Structured JSON (Pydantic) rendered as Markdown |
2. System Architecture Flowchart
flowchart TD
A["User Uploads PDF via Gradio UI"] --> B["File Validation"]
B -->|Invalid| B_ERR["Return Error to UI"]
B -->|Valid .pdf| C["GATE 1: Safety Guardian (Programmatic)"]
subgraph SAFETY_GATE["Safety Gate β No LLM"]
C --> C1["PDF Parser Tool β Extract raw text"]
C1 --> C2["PII Detector Tool β Scan & redact PII"]
C2 --> C3["Injection Scanner Tool β Check for prompt injections"]
C3 --> C4["URL Validator Tool β Flag malicious URLs"]
C4 --> C5{"is_safe?"}
end
C5 -->|UNSAFE| BLOCK["Block Document β Show Safety Report"]
C5 -->|SAFE| D["Sanitized Text passed to Analysis Pipeline"]
subgraph ANALYSIS_PIPELINE["Analysis Pipeline β CrewAI Sequential"]
D --> E["STEP 1: Paper Extractor Agent (GPT-4o)"]
E -->|PaperExtraction JSON| F["STEP 2a: Methodology Critic Agent (GPT-4o-mini)"]
E -->|PaperExtraction JSON| G["STEP 2b: Relevance Researcher Agent (GPT-4o-mini)"]
F -->|MethodologyCritique JSON| H["STEP 3: Review Synthesizer Agent (GPT-4o-mini)"]
G -->|RelevanceReport JSON| H
E -->|PaperExtraction JSON| H
H -->|ReviewDraft JSON| I["STEP 4: Rubric Evaluator Agent (GPT-4o-mini)"]
I -->|RubricEvaluation JSON| J["STEP 5: Enhancer Agent (GPT-4o-mini)"]
H -->|ReviewDraft JSON| J
E -->|PaperExtraction JSON| J
end
J -->|FinalReview JSON| K["Output Formatting"]
subgraph OUTPUT["Gradio UI β 6 Tabs"]
K --> K1["Executive Summary Tab"]
K --> K2["Full Review Tab"]
K --> K3["Rubric Scorecard Tab"]
K --> K4["Safety Report Tab"]
K --> K5["Agent Outputs Tab"]
K --> K6["Pipeline Logs Tab"]
end
K2 --> DL["Download Full Report (.md)"]
Simplified Agent Pipeline Flow
flowchart LR
PDF["PDF Upload"] --> SG["Safety\nGuardian"]
SG --> PE["Paper\nExtractor"]
PE --> MC["Methodology\nCritic"]
PE --> RR["Relevance\nResearcher"]
MC --> RS["Review\nSynthesizer"]
RR --> RS
RS --> RE["Rubric\nEvaluator"]
RE --> EN["Enhancer"]
EN --> OUT["Final\nReport"]
style SG fill:#ff6b6b,stroke:#c0392b,color:#fff
style PE fill:#74b9ff,stroke:#2980b9,color:#fff
style MC fill:#a29bfe,stroke:#6c5ce7,color:#fff
style RR fill:#a29bfe,stroke:#6c5ce7,color:#fff
style RS fill:#55efc4,stroke:#00b894,color:#fff
style RE fill:#ffeaa7,stroke:#fdcb6e,color:#333
style EN fill:#fd79a8,stroke:#e84393,color:#fff
Data Flow Diagram
flowchart TD
subgraph TOOLS["Tools Layer"]
T1["pdf_parser_tool"]
T2["pii_detector_tool"]
T3["injection_scanner_tool"]
T4["url_validator_tool"]
T5["citation_search_tool"]
end
subgraph AGENTS["Agent Layer"]
A1["Safety Guardian\n(Programmatic)"]
A2["Paper Extractor\n(GPT-4o)"]
A3["Methodology Critic\n(GPT-4o-mini)"]
A4["Relevance Researcher\n(GPT-4o-mini)"]
A5["Review Synthesizer\n(GPT-4o-mini)"]
A6["Rubric Evaluator\n(GPT-4o-mini)"]
A7["Enhancer\n(GPT-4o-mini)"]
end
subgraph SCHEMAS["Schema Layer (Pydantic)"]
S1["SafetyReport"]
S2["PaperExtraction"]
S3["MethodologyCritique"]
S4["RelevanceReport"]
S5["ReviewDraft"]
S6["RubricEvaluation"]
S7["FinalReview"]
end
A1 -.->|uses| T1 & T2 & T3 & T4
A2 -.->|uses| T1
A4 -.->|uses| T5
A1 -->|outputs| S1
A2 -->|outputs| S2
A3 -->|outputs| S3
A4 -->|outputs| S4
A5 -->|outputs| S5
A6 -->|outputs| S6
A7 -->|outputs| S7
3. Pipeline Flow
The system runs as a sequential pipeline with one safety gate and six analysis steps:
| Stage | Agent | LLM | Input | Output Schema | Tools Used |
|---|---|---|---|---|---|
| Gate 1 | Safety Guardian | None (programmatic) | Raw PDF file | SafetyReport |
pdf_parser, pii_detector, injection_scanner, url_validator |
| Step 1 | Paper Extractor | GPT-4o | Sanitized text | PaperExtraction |
pdf_parser |
| Step 2a | Methodology Critic | GPT-4o-mini | PaperExtraction JSON | MethodologyCritique |
None |
| Step 2b | Relevance Researcher | GPT-4o-mini | PaperExtraction JSON | RelevanceReport |
citation_search |
| Step 3 | Review Synthesizer | GPT-4o-mini | Paper + Critique + Research | ReviewDraft |
None |
| Step 4 | Rubric Evaluator | GPT-4o-mini | Draft + Paper + Critique + Research | RubricEvaluation |
None |
| Step 5 | Enhancer | GPT-4o-mini | Draft + Rubric + Paper | FinalReview |
None |
Pipeline Error Handling
- Each agent step is wrapped in
try/exceptβ a failure in one agent does not crash the pipeline. - If an agent fails, its output defaults to
{"error": "..."}and downstream agents work with available data. - The Safety Gate blocks the entire pipeline if
is_safe=False(prompt injection or malicious URLs detected). - PII is always redacted before analysis, even for "safe" documents.
4. Agents
Agent 1: Safety Guardian (Programmatic)
| Property | Value |
|---|---|
| File | app.py β run_safety_check() |
| LLM | None β fully programmatic |
| Purpose | Gate that blocks unsafe documents before LLM analysis |
| Tools | pdf_parser, pii_detector, injection_scanner, url_validator |
| Output | SafetyReport |
Runs all 4 safety tools as Python functions directly (no CrewAI agent overhead). This is deterministic, fast (<1 second), and avoids LLM hallucinations in safety-critical decisions.
Decision Logic:
is_safe = (not injection_detected) AND (no malicious URLs)- Risk level:
highif injection or malicious URLs,mediumif PII found,lowotherwise - If
is_safe=Falseβ pipeline is blocked, user sees the Safety Report
Agent 2: Paper Extractor
| Property | Value |
|---|---|
| File | agents/paper_extractor.py |
| LLM | GPT-4o (temperature=0.1, seed=42) |
| Role | Research Paper Data Extractor |
| Tools | pdf_parser_tool |
| Output | PaperExtraction |
| Max Iterations | 3 |
Extracts structured metadata from the raw paper text: title, authors, abstract, methodology, key findings, contributions, limitations, references count, paper type, and extraction confidence level.
Uses GPT-4o (not mini) because extraction requires deep comprehension of the full paper.
Agent 3: Methodology Critic
| Property | Value |
|---|---|
| File | agents/methodology_critic.py |
| LLM | GPT-4o-mini (temperature=0.1, seed=42) |
| Role | Research Methodology Evaluator |
| Tools | None (pure LLM reasoning) |
| Output | MethodologyCritique |
| Max Iterations | 5 |
Critically evaluates study design, statistical methods, sample sizes, reproducibility, and logical consistency. For theoretical papers, adapts criteria to assess logical rigor and proof completeness. Produces scores for methodology (1-10) and reproducibility (1-10).
Agent 4: Relevance Researcher
| Property | Value |
|---|---|
| File | agents/relevance_researcher.py |
| LLM | GPT-4o-mini (temperature=0.1, seed=42) |
| Role | Related Work Analyst |
| Tools | citation_search_tool |
| Output | RelevanceReport |
| Max Iterations | 5 |
Searches for real related papers using Semantic Scholar / OpenAlex APIs. Assesses novelty by comparing against existing work. Produces a novelty score (1-10), field context, gaps addressed, and overlaps.
Critical Rule: Must NOT hallucinate citations. Only uses papers found by the search tool.
Agent 5: Review Synthesizer
| Property | Value |
|---|---|
| File | agents/review_synthesizer.py |
| LLM | GPT-4o-mini (temperature=0.1, seed=42, max_tokens=4000) |
| Role | Peer Review Report Writer |
| Tools | None (synthesis only) |
| Output | ReviewDraft |
| Max Iterations | 3 |
Combines insights from Paper Extractor, Methodology Critic, and Relevance Researcher into a coherent peer-review draft with summary, strengths, weaknesses, assessments, recommendation (Accept/Revise/Reject), and questions for authors.
Agent 6: Rubric Evaluator
| Property | Value |
|---|---|
| File | agents/rubric_evaluator.py |
| LLM | GPT-4o-mini (temperature=0.1, seed=42) |
| Role | Objective Quality Scorer |
| Tools | None (evaluation logic only) |
| Output | RubricEvaluation |
| Max Iterations | 3 |
Scores the review draft on 15 strict binary criteria (0 or 1 each). Pass threshold: >= 11/15.
15 Rubric Criteria:
| # | Category | Criterion |
|---|---|---|
| 1 | Content Completeness | Title & authors correctly identified |
| 2 | Content Completeness | Abstract accurately summarized |
| 3 | Content Completeness | Methodology clearly described |
| 4 | Content Completeness | At least 3 distinct strengths |
| 5 | Content Completeness | At least 3 distinct weaknesses |
| 6 | Content Completeness | Limitations acknowledged |
| 7 | Content Completeness | Related work present (2+ papers) |
| 8 | Analytical Depth | Novelty assessed with justification |
| 9 | Analytical Depth | Reproducibility discussed |
| 10 | Analytical Depth | Evidence quality evaluated |
| 11 | Analytical Depth | Contribution to field stated |
| 12 | Review Quality | Recommendation justified with evidence |
| 13 | Review Quality | At least 3 actionable questions |
| 14 | Review Quality | No hallucinated citations |
| 15 | Review Quality | Professional tone and coherent structure |
Agent 7: Enhancer
| Property | Value |
|---|---|
| File | agents/enhancer.py |
| LLM | GPT-4o-mini (temperature=0.1, seed=42) |
| Role | Review Report Enhancer |
| Tools | None (writing/synthesis only) |
| Output | FinalReview |
| Max Iterations | 3 |
Takes the draft review + rubric feedback and produces a complete, publication-ready peer review report (800-1500 words). Fixes all rubric criteria that scored 0 while keeping content that passed. Produces the final executive summary, recommendation, confidence score, and improvement log.
5. Tools
Tool 1: PDF Parser (tools/pdf_parser.py)
| Property | Value |
|---|---|
| Library | pdfplumber |
| Assigned To | Safety Guardian, Paper Extractor |
| Input | File path (string) |
| Output | Extracted text (string) or "ERROR: ..." |
Guardrails:
- File must be
.pdf - File must exist on disk
- File size max: 20 MB
- Minimum extractable text: 100 chars
- Never raises exceptions β returns error strings
Tool 2: PII Detector (tools/pii_detector.py)
| Property | Value |
|---|---|
| Approach | Regex pattern matching |
| Assigned To | Safety Guardian |
| Input | Text to scan |
| Output | JSON with findings, redacted_text, pii_count |
Patterns Detected:
- Email addresses
- Phone numbers (US format)
- Social Security Numbers
- Credit card numbers
All matches are replaced with [REDACTED_TYPE] tokens.
Tool 3: Prompt Injection Scanner (tools/injection_scanner.py)
| Property | Value |
|---|---|
| Approach | Regex pattern matching (9 patterns) |
| Assigned To | Safety Guardian |
| Input | Text to scan |
| Output | JSON with is_safe, suspicious_patterns, patterns_checked |
Patterns Checked:
- "ignore previous instructions"
- "disregard above/previous"
- "forget everything/all/your instructions"
- "new instructions:"
[INST]token<|im_start|>token<|system|>token- "override safety"
- "jailbreak"
Fail-safe: If scanning itself fails, the document is treated as unsafe.
Tool 4: URL Validator (tools/url_validator.py)
| Property | Value |
|---|---|
| Approach | Regex extraction + blocklist matching |
| Assigned To | Safety Guardian |
| Input | Text to scan |
| Output | JSON with total_urls, malicious_urls, is_safe |
Suspicious Indicators:
- URL shorteners: bit.ly, tinyurl, t.co, goo.gl
- Dangerous protocols: data:, javascript:, file://
- Keywords: malware, phishing
Max 50 URLs checked per scan.
Tool 5: Citation Search (tools/citation_search.py)
| Property | Value |
|---|---|
| Primary API | Semantic Scholar (with retry for HTTP 429) |
| Fallback API | OpenAlex (free, no rate limits) |
| Assigned To | Relevance Researcher |
| Input | Search query (string, max 200 chars) |
| Output | Formatted text list of papers with title, authors, year, citations, abstract |
Rate Limiting:
- Max 3 API calls per analysis run (tracked globally)
- 10-second timeout per API call
- Exponential backoff for rate limits (1s, 2s, 4s)
Fallback Chain: Semantic Scholar -> OpenAlex -> "Search unavailable" message
6. Pydantic Schemas
All schemas inherit from BaseAgentOutput which enforces extra="ignore" for Gradio compatibility.
File: schemas/models.py
SafetyReport
is_safe: bool (default=False, fail-safe)
pii_found: list[str]
injection_detected: bool
malicious_urls: list[str]
sanitized_text: str
risk_level: "low" | "medium" | "high"
PaperExtraction
title: str
authors: list[str]
abstract: str
methodology: str
key_findings: list[str]
contributions: list[str]
limitations_stated: list[str]
references_count: int
paper_type: "empirical" | "theoretical" | "survey" | "system" | "mixed"
extraction_confidence: "high" | "medium" | "low"
MethodologyCritique
strengths: list[str]
weaknesses: list[str]
limitations: list[str]
methodology_score: int (1-10)
reproducibility_score: int (1-10)
suggestions: list[str]
bias_risks: list[str]
RelevanceReport
related_papers: list[RelatedPaper]
novelty_score: int (1-10)
field_context: str
gaps_addressed: list[str]
overlaps_with_existing: list[str]
RelatedPaper
title: str
authors: str
year: int
citation_count: int
relevance: str
ReviewDraft
summary: str
strengths_section: str
weaknesses_section: str
methodology_assessment: str
novelty_assessment: str
related_work_context: str
questions_for_authors: list[str]
recommendation: "Accept" | "Revise" | "Reject"
confidence: int (1-5)
detailed_review: str
RubricEvaluation
scores: dict[str, int] (15 criteria, each 0 or 1)
total_score: int (0-15)
failed_criteria: list[str]
feedback_per_criterion: dict[str, str]
passed: bool (True if total_score >= 11)
FinalReview
executive_summary: str
paper_metadata: dict
strengths: list[str]
weaknesses: list[str]
methodology_assessment: str
novelty_assessment: str
related_work_context: str
questions_for_authors: list[str]
recommendation: "Accept" | "Revise" | "Reject"
confidence_score: int (1-5)
rubric_scores: dict[str, int]
rubric_total: int
improvement_log: list[str]
7. Gradio UI
The UI is a single-page Gradio Blocks application with 6 tabs:
| Tab | Content | Component |
|---|---|---|
| Executive Summary | Recommendation, confidence, rubric score, paper info | Markdown + Download button |
| Full Review | Strengths, weaknesses, methodology, novelty, questions | Markdown |
| Rubric Scorecard | 15 criteria scores in 3 categories with feedback | Markdown (table) |
| Safety Report | PII findings, injection status, URL analysis | Markdown |
| Agent Outputs | Raw structured output from each of the 7 agents | Markdown |
| Pipeline Logs | Timestamped execution log + JSON summary | Textbox + Code |
UI Features
- Progress bar with real-time status updates (e.g., "Agent 3/6: Searching Related Work...")
- Download button to export the full review as a
.mdfile - File validation β only accepts
.pdffiles
8. Safety & Guardrails
Layered Safety Architecture
flowchart TD
subgraph LAYER1["Layer 1: Input Validation"]
IV1["File type check (.pdf only)"]
IV2["File size check (max 20MB)"]
IV3["Minimum text check (100+ chars)"]
end
subgraph LAYER2["Layer 2: Content Safety"]
CS1["PII Detection & Redaction"]
CS2["Prompt Injection Scanning"]
CS3["URL Blocklist Validation"]
end
subgraph LAYER3["Layer 3: LLM Configuration"]
LC1["Low temperature (0.1)"]
LC2["Deterministic seed (42)"]
LC3["Max iterations per agent"]
LC4["Structured output (Pydantic)"]
end
subgraph LAYER4["Layer 4: Pipeline Resilience"]
PR1["Per-agent try/except"]
PR2["Graceful degradation"]
PR3["API rate limiting (3 calls max)"]
PR4["Timeout enforcement (10s)"]
end
subgraph LAYER5["Layer 5: Observability"]
OB1["PipelineLogger β every step logged"]
OB2["API key redaction in logs"]
OB3["Execution summary with timing"]
end
LAYER1 --> LAYER2 --> LAYER3 --> LAYER4 --> LAYER5
Key Principles
- Fail-safe defaults:
is_safe=False, risk defaults to unsafe - No LLM in the safety gate: All safety checks are deterministic regex/logic
- PII always redacted: Even for safe documents, PII is stripped before LLM analysis
- Structured outputs: Every agent uses Pydantic schemas enforced by CrewAI
- No secrets in logs: API keys are regex-redacted from all log output
9. Tech Stack & Dependencies
| Package | Version | Purpose |
|---|---|---|
crewai |
>= 0.86.0 | Multi-agent orchestration framework |
crewai-tools |
>= 0.17.0 | Tool wrapper utilities |
openai |
>= 1.0.0 | LLM API client (GPT-4o, GPT-4o-mini) |
pdfplumber |
>= 0.11.0 | PDF text extraction |
pydantic |
>= 2.0.0 | Structured output validation |
gradio |
>= 5.0.0 | Web UI framework |
python-dotenv |
>= 1.0.0 | Environment variable loading |
requests |
>= 2.31.0 | HTTP client for citation APIs |
Environment Variables
| Variable | Required | Purpose |
|---|---|---|
OPENAI_API_KEY |
Yes | OpenAI API access (GPT-4o required) |
10. Project Structure
Homework5_agentincAI/
|-- app.py # Main application (pipeline + Gradio UI)
|-- requirements.txt # Python dependencies
|-- README.md # HuggingFace Space metadata
|-- .env # Environment variables (API keys)
|-- .gitignore
|
|-- agents/ # CrewAI agent definitions
| |-- __init__.py
| |-- paper_extractor.py # Agent 2: Structured data extraction
| |-- methodology_critic.py # Agent 3: Methodology evaluation
| |-- relevance_researcher.py # Agent 4: Related work search
| |-- review_synthesizer.py # Agent 5: Draft review writer
| |-- rubric_evaluator.py # Agent 6: 15-criteria quality scorer
| |-- enhancer.py # Agent 7: Final report polisher
|
|-- tools/ # CrewAI tool definitions
| |-- __init__.py
| |-- pdf_parser.py # PDF text extraction
| |-- pii_detector.py # PII detection & redaction
| |-- injection_scanner.py # Prompt injection detection
| |-- url_validator.py # URL blocklist validation
| |-- citation_search.py # Semantic Scholar / OpenAlex search
|
|-- schemas/ # Pydantic output models
| |-- __init__.py
| |-- models.py # All 8 schema definitions
|
|-- test_components.py # Component tests
|-- tests/ # Test directory
11. How to Run
Prerequisites
- Python 3.10+
- OpenAI API key with GPT-4o access
Setup
# 1. Install dependencies
pip install -r requirements.txt
# 2. Create .env file
echo "OPENAI_API_KEY=your-key-here" > .env
# 3. Run the application
python app.py
The Gradio UI launches at http://0.0.0.0:7860.
Usage
- Open the UI in your browser
- Upload a research paper PDF (max 20 MB)
- Click "Analyze Paper"
- Wait 1-3 minutes for the pipeline to complete
- Review results across all 6 tabs
- Download the full report as Markdown
Generated for AI Research Paper Analyst β Homework 5, Agentic AI Bootcamp