AI-Research-Paper-Analyst / PROJECT_DOCUMENTATION.md
Saleh
Clean deployment to HuggingFace Space
2447eba

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

AI Research Paper Analyst β€” Complete Project Documentation

Table of Contents

  1. Project Overview
  2. System Architecture Flowchart
  3. Pipeline Flow
  4. Agents
  5. Tools
  6. Pydantic Schemas
  7. Gradio UI
  8. Safety & Guardrails
  9. Tech Stack & Dependencies
  10. Project Structure
  11. How to Run

1. Project Overview

AI Research Paper Analyst is an automated peer-review system powered by a multi-agent AI pipeline. A user uploads a research paper (PDF), and the system produces a comprehensive, publication-ready peer review β€” including methodology critique, novelty assessment, rubric scoring, and a final Accept/Revise/Reject recommendation.

Property Value
Framework CrewAI (multi-agent orchestration)
LLM Backend OpenAI GPT-4o (extraction) + GPT-4o-mini (all other agents)
Frontend Gradio 5.x
Safety Programmatic (regex/logic-based) β€” no LLM in the safety gate
Output Format Structured JSON (Pydantic) rendered as Markdown

2. System Architecture Flowchart

flowchart TD
    A["User Uploads PDF via Gradio UI"] --> B["File Validation"]
    B -->|Invalid| B_ERR["Return Error to UI"]
    B -->|Valid .pdf| C["GATE 1: Safety Guardian (Programmatic)"]

    subgraph SAFETY_GATE["Safety Gate β€” No LLM"]
        C --> C1["PDF Parser Tool β€” Extract raw text"]
        C1 --> C2["PII Detector Tool β€” Scan & redact PII"]
        C2 --> C3["Injection Scanner Tool β€” Check for prompt injections"]
        C3 --> C4["URL Validator Tool β€” Flag malicious URLs"]
        C4 --> C5{"is_safe?"}
    end

    C5 -->|UNSAFE| BLOCK["Block Document β€” Show Safety Report"]
    C5 -->|SAFE| D["Sanitized Text passed to Analysis Pipeline"]

    subgraph ANALYSIS_PIPELINE["Analysis Pipeline β€” CrewAI Sequential"]
        D --> E["STEP 1: Paper Extractor Agent (GPT-4o)"]
        E -->|PaperExtraction JSON| F["STEP 2a: Methodology Critic Agent (GPT-4o-mini)"]
        E -->|PaperExtraction JSON| G["STEP 2b: Relevance Researcher Agent (GPT-4o-mini)"]
        F -->|MethodologyCritique JSON| H["STEP 3: Review Synthesizer Agent (GPT-4o-mini)"]
        G -->|RelevanceReport JSON| H
        E -->|PaperExtraction JSON| H
        H -->|ReviewDraft JSON| I["STEP 4: Rubric Evaluator Agent (GPT-4o-mini)"]
        I -->|RubricEvaluation JSON| J["STEP 5: Enhancer Agent (GPT-4o-mini)"]
        H -->|ReviewDraft JSON| J
        E -->|PaperExtraction JSON| J
    end

    J -->|FinalReview JSON| K["Output Formatting"]

    subgraph OUTPUT["Gradio UI β€” 6 Tabs"]
        K --> K1["Executive Summary Tab"]
        K --> K2["Full Review Tab"]
        K --> K3["Rubric Scorecard Tab"]
        K --> K4["Safety Report Tab"]
        K --> K5["Agent Outputs Tab"]
        K --> K6["Pipeline Logs Tab"]
    end

    K2 --> DL["Download Full Report (.md)"]

Simplified Agent Pipeline Flow

flowchart LR
    PDF["PDF Upload"] --> SG["Safety\nGuardian"]
    SG --> PE["Paper\nExtractor"]
    PE --> MC["Methodology\nCritic"]
    PE --> RR["Relevance\nResearcher"]
    MC --> RS["Review\nSynthesizer"]
    RR --> RS
    RS --> RE["Rubric\nEvaluator"]
    RE --> EN["Enhancer"]
    EN --> OUT["Final\nReport"]

    style SG fill:#ff6b6b,stroke:#c0392b,color:#fff
    style PE fill:#74b9ff,stroke:#2980b9,color:#fff
    style MC fill:#a29bfe,stroke:#6c5ce7,color:#fff
    style RR fill:#a29bfe,stroke:#6c5ce7,color:#fff
    style RS fill:#55efc4,stroke:#00b894,color:#fff
    style RE fill:#ffeaa7,stroke:#fdcb6e,color:#333
    style EN fill:#fd79a8,stroke:#e84393,color:#fff

Data Flow Diagram

flowchart TD
    subgraph TOOLS["Tools Layer"]
        T1["pdf_parser_tool"]
        T2["pii_detector_tool"]
        T3["injection_scanner_tool"]
        T4["url_validator_tool"]
        T5["citation_search_tool"]
    end

    subgraph AGENTS["Agent Layer"]
        A1["Safety Guardian\n(Programmatic)"]
        A2["Paper Extractor\n(GPT-4o)"]
        A3["Methodology Critic\n(GPT-4o-mini)"]
        A4["Relevance Researcher\n(GPT-4o-mini)"]
        A5["Review Synthesizer\n(GPT-4o-mini)"]
        A6["Rubric Evaluator\n(GPT-4o-mini)"]
        A7["Enhancer\n(GPT-4o-mini)"]
    end

    subgraph SCHEMAS["Schema Layer (Pydantic)"]
        S1["SafetyReport"]
        S2["PaperExtraction"]
        S3["MethodologyCritique"]
        S4["RelevanceReport"]
        S5["ReviewDraft"]
        S6["RubricEvaluation"]
        S7["FinalReview"]
    end

    A1 -.->|uses| T1 & T2 & T3 & T4
    A2 -.->|uses| T1
    A4 -.->|uses| T5

    A1 -->|outputs| S1
    A2 -->|outputs| S2
    A3 -->|outputs| S3
    A4 -->|outputs| S4
    A5 -->|outputs| S5
    A6 -->|outputs| S6
    A7 -->|outputs| S7

3. Pipeline Flow

The system runs as a sequential pipeline with one safety gate and six analysis steps:

Stage Agent LLM Input Output Schema Tools Used
Gate 1 Safety Guardian None (programmatic) Raw PDF file SafetyReport pdf_parser, pii_detector, injection_scanner, url_validator
Step 1 Paper Extractor GPT-4o Sanitized text PaperExtraction pdf_parser
Step 2a Methodology Critic GPT-4o-mini PaperExtraction JSON MethodologyCritique None
Step 2b Relevance Researcher GPT-4o-mini PaperExtraction JSON RelevanceReport citation_search
Step 3 Review Synthesizer GPT-4o-mini Paper + Critique + Research ReviewDraft None
Step 4 Rubric Evaluator GPT-4o-mini Draft + Paper + Critique + Research RubricEvaluation None
Step 5 Enhancer GPT-4o-mini Draft + Rubric + Paper FinalReview None

Pipeline Error Handling

  • Each agent step is wrapped in try/except β€” a failure in one agent does not crash the pipeline.
  • If an agent fails, its output defaults to {"error": "..."} and downstream agents work with available data.
  • The Safety Gate blocks the entire pipeline if is_safe=False (prompt injection or malicious URLs detected).
  • PII is always redacted before analysis, even for "safe" documents.

4. Agents

Agent 1: Safety Guardian (Programmatic)

Property Value
File app.py β€” run_safety_check()
LLM None β€” fully programmatic
Purpose Gate that blocks unsafe documents before LLM analysis
Tools pdf_parser, pii_detector, injection_scanner, url_validator
Output SafetyReport

Runs all 4 safety tools as Python functions directly (no CrewAI agent overhead). This is deterministic, fast (<1 second), and avoids LLM hallucinations in safety-critical decisions.

Decision Logic:

  • is_safe = (not injection_detected) AND (no malicious URLs)
  • Risk level: high if injection or malicious URLs, medium if PII found, low otherwise
  • If is_safe=False β†’ pipeline is blocked, user sees the Safety Report

Agent 2: Paper Extractor

Property Value
File agents/paper_extractor.py
LLM GPT-4o (temperature=0.1, seed=42)
Role Research Paper Data Extractor
Tools pdf_parser_tool
Output PaperExtraction
Max Iterations 3

Extracts structured metadata from the raw paper text: title, authors, abstract, methodology, key findings, contributions, limitations, references count, paper type, and extraction confidence level.

Uses GPT-4o (not mini) because extraction requires deep comprehension of the full paper.


Agent 3: Methodology Critic

Property Value
File agents/methodology_critic.py
LLM GPT-4o-mini (temperature=0.1, seed=42)
Role Research Methodology Evaluator
Tools None (pure LLM reasoning)
Output MethodologyCritique
Max Iterations 5

Critically evaluates study design, statistical methods, sample sizes, reproducibility, and logical consistency. For theoretical papers, adapts criteria to assess logical rigor and proof completeness. Produces scores for methodology (1-10) and reproducibility (1-10).


Agent 4: Relevance Researcher

Property Value
File agents/relevance_researcher.py
LLM GPT-4o-mini (temperature=0.1, seed=42)
Role Related Work Analyst
Tools citation_search_tool
Output RelevanceReport
Max Iterations 5

Searches for real related papers using Semantic Scholar / OpenAlex APIs. Assesses novelty by comparing against existing work. Produces a novelty score (1-10), field context, gaps addressed, and overlaps.

Critical Rule: Must NOT hallucinate citations. Only uses papers found by the search tool.


Agent 5: Review Synthesizer

Property Value
File agents/review_synthesizer.py
LLM GPT-4o-mini (temperature=0.1, seed=42, max_tokens=4000)
Role Peer Review Report Writer
Tools None (synthesis only)
Output ReviewDraft
Max Iterations 3

Combines insights from Paper Extractor, Methodology Critic, and Relevance Researcher into a coherent peer-review draft with summary, strengths, weaknesses, assessments, recommendation (Accept/Revise/Reject), and questions for authors.


Agent 6: Rubric Evaluator

Property Value
File agents/rubric_evaluator.py
LLM GPT-4o-mini (temperature=0.1, seed=42)
Role Objective Quality Scorer
Tools None (evaluation logic only)
Output RubricEvaluation
Max Iterations 3

Scores the review draft on 15 strict binary criteria (0 or 1 each). Pass threshold: >= 11/15.

15 Rubric Criteria:

# Category Criterion
1 Content Completeness Title & authors correctly identified
2 Content Completeness Abstract accurately summarized
3 Content Completeness Methodology clearly described
4 Content Completeness At least 3 distinct strengths
5 Content Completeness At least 3 distinct weaknesses
6 Content Completeness Limitations acknowledged
7 Content Completeness Related work present (2+ papers)
8 Analytical Depth Novelty assessed with justification
9 Analytical Depth Reproducibility discussed
10 Analytical Depth Evidence quality evaluated
11 Analytical Depth Contribution to field stated
12 Review Quality Recommendation justified with evidence
13 Review Quality At least 3 actionable questions
14 Review Quality No hallucinated citations
15 Review Quality Professional tone and coherent structure

Agent 7: Enhancer

Property Value
File agents/enhancer.py
LLM GPT-4o-mini (temperature=0.1, seed=42)
Role Review Report Enhancer
Tools None (writing/synthesis only)
Output FinalReview
Max Iterations 3

Takes the draft review + rubric feedback and produces a complete, publication-ready peer review report (800-1500 words). Fixes all rubric criteria that scored 0 while keeping content that passed. Produces the final executive summary, recommendation, confidence score, and improvement log.


5. Tools

Tool 1: PDF Parser (tools/pdf_parser.py)

Property Value
Library pdfplumber
Assigned To Safety Guardian, Paper Extractor
Input File path (string)
Output Extracted text (string) or "ERROR: ..."

Guardrails:

  • File must be .pdf
  • File must exist on disk
  • File size max: 20 MB
  • Minimum extractable text: 100 chars
  • Never raises exceptions β€” returns error strings

Tool 2: PII Detector (tools/pii_detector.py)

Property Value
Approach Regex pattern matching
Assigned To Safety Guardian
Input Text to scan
Output JSON with findings, redacted_text, pii_count

Patterns Detected:

  • Email addresses
  • Phone numbers (US format)
  • Social Security Numbers
  • Credit card numbers

All matches are replaced with [REDACTED_TYPE] tokens.


Tool 3: Prompt Injection Scanner (tools/injection_scanner.py)

Property Value
Approach Regex pattern matching (9 patterns)
Assigned To Safety Guardian
Input Text to scan
Output JSON with is_safe, suspicious_patterns, patterns_checked

Patterns Checked:

  • "ignore previous instructions"
  • "disregard above/previous"
  • "forget everything/all/your instructions"
  • "new instructions:"
  • [INST] token
  • <|im_start|> token
  • <|system|> token
  • "override safety"
  • "jailbreak"

Fail-safe: If scanning itself fails, the document is treated as unsafe.


Tool 4: URL Validator (tools/url_validator.py)

Property Value
Approach Regex extraction + blocklist matching
Assigned To Safety Guardian
Input Text to scan
Output JSON with total_urls, malicious_urls, is_safe

Suspicious Indicators:

  • URL shorteners: bit.ly, tinyurl, t.co, goo.gl
  • Dangerous protocols: data:, javascript:, file://
  • Keywords: malware, phishing

Max 50 URLs checked per scan.


Tool 5: Citation Search (tools/citation_search.py)

Property Value
Primary API Semantic Scholar (with retry for HTTP 429)
Fallback API OpenAlex (free, no rate limits)
Assigned To Relevance Researcher
Input Search query (string, max 200 chars)
Output Formatted text list of papers with title, authors, year, citations, abstract

Rate Limiting:

  • Max 3 API calls per analysis run (tracked globally)
  • 10-second timeout per API call
  • Exponential backoff for rate limits (1s, 2s, 4s)

Fallback Chain: Semantic Scholar -> OpenAlex -> "Search unavailable" message


6. Pydantic Schemas

All schemas inherit from BaseAgentOutput which enforces extra="ignore" for Gradio compatibility.

File: schemas/models.py

SafetyReport

is_safe: bool (default=False, fail-safe)
pii_found: list[str]
injection_detected: bool
malicious_urls: list[str]
sanitized_text: str
risk_level: "low" | "medium" | "high"

PaperExtraction

title: str
authors: list[str]
abstract: str
methodology: str
key_findings: list[str]
contributions: list[str]
limitations_stated: list[str]
references_count: int
paper_type: "empirical" | "theoretical" | "survey" | "system" | "mixed"
extraction_confidence: "high" | "medium" | "low"

MethodologyCritique

strengths: list[str]
weaknesses: list[str]
limitations: list[str]
methodology_score: int (1-10)
reproducibility_score: int (1-10)
suggestions: list[str]
bias_risks: list[str]

RelevanceReport

related_papers: list[RelatedPaper]
novelty_score: int (1-10)
field_context: str
gaps_addressed: list[str]
overlaps_with_existing: list[str]

RelatedPaper

title: str
authors: str
year: int
citation_count: int
relevance: str

ReviewDraft

summary: str
strengths_section: str
weaknesses_section: str
methodology_assessment: str
novelty_assessment: str
related_work_context: str
questions_for_authors: list[str]
recommendation: "Accept" | "Revise" | "Reject"
confidence: int (1-5)
detailed_review: str

RubricEvaluation

scores: dict[str, int]        (15 criteria, each 0 or 1)
total_score: int              (0-15)
failed_criteria: list[str]
feedback_per_criterion: dict[str, str]
passed: bool                  (True if total_score >= 11)

FinalReview

executive_summary: str
paper_metadata: dict
strengths: list[str]
weaknesses: list[str]
methodology_assessment: str
novelty_assessment: str
related_work_context: str
questions_for_authors: list[str]
recommendation: "Accept" | "Revise" | "Reject"
confidence_score: int (1-5)
rubric_scores: dict[str, int]
rubric_total: int
improvement_log: list[str]

7. Gradio UI

The UI is a single-page Gradio Blocks application with 6 tabs:

Tab Content Component
Executive Summary Recommendation, confidence, rubric score, paper info Markdown + Download button
Full Review Strengths, weaknesses, methodology, novelty, questions Markdown
Rubric Scorecard 15 criteria scores in 3 categories with feedback Markdown (table)
Safety Report PII findings, injection status, URL analysis Markdown
Agent Outputs Raw structured output from each of the 7 agents Markdown
Pipeline Logs Timestamped execution log + JSON summary Textbox + Code

UI Features

  • Progress bar with real-time status updates (e.g., "Agent 3/6: Searching Related Work...")
  • Download button to export the full review as a .md file
  • File validation β€” only accepts .pdf files

8. Safety & Guardrails

Layered Safety Architecture

flowchart TD
    subgraph LAYER1["Layer 1: Input Validation"]
        IV1["File type check (.pdf only)"]
        IV2["File size check (max 20MB)"]
        IV3["Minimum text check (100+ chars)"]
    end

    subgraph LAYER2["Layer 2: Content Safety"]
        CS1["PII Detection & Redaction"]
        CS2["Prompt Injection Scanning"]
        CS3["URL Blocklist Validation"]
    end

    subgraph LAYER3["Layer 3: LLM Configuration"]
        LC1["Low temperature (0.1)"]
        LC2["Deterministic seed (42)"]
        LC3["Max iterations per agent"]
        LC4["Structured output (Pydantic)"]
    end

    subgraph LAYER4["Layer 4: Pipeline Resilience"]
        PR1["Per-agent try/except"]
        PR2["Graceful degradation"]
        PR3["API rate limiting (3 calls max)"]
        PR4["Timeout enforcement (10s)"]
    end

    subgraph LAYER5["Layer 5: Observability"]
        OB1["PipelineLogger β€” every step logged"]
        OB2["API key redaction in logs"]
        OB3["Execution summary with timing"]
    end

    LAYER1 --> LAYER2 --> LAYER3 --> LAYER4 --> LAYER5

Key Principles

  • Fail-safe defaults: is_safe=False, risk defaults to unsafe
  • No LLM in the safety gate: All safety checks are deterministic regex/logic
  • PII always redacted: Even for safe documents, PII is stripped before LLM analysis
  • Structured outputs: Every agent uses Pydantic schemas enforced by CrewAI
  • No secrets in logs: API keys are regex-redacted from all log output

9. Tech Stack & Dependencies

Package Version Purpose
crewai >= 0.86.0 Multi-agent orchestration framework
crewai-tools >= 0.17.0 Tool wrapper utilities
openai >= 1.0.0 LLM API client (GPT-4o, GPT-4o-mini)
pdfplumber >= 0.11.0 PDF text extraction
pydantic >= 2.0.0 Structured output validation
gradio >= 5.0.0 Web UI framework
python-dotenv >= 1.0.0 Environment variable loading
requests >= 2.31.0 HTTP client for citation APIs

Environment Variables

Variable Required Purpose
OPENAI_API_KEY Yes OpenAI API access (GPT-4o required)

10. Project Structure

Homework5_agentincAI/
|-- app.py                          # Main application (pipeline + Gradio UI)
|-- requirements.txt                # Python dependencies
|-- README.md                       # HuggingFace Space metadata
|-- .env                            # Environment variables (API keys)
|-- .gitignore
|
|-- agents/                         # CrewAI agent definitions
|   |-- __init__.py
|   |-- paper_extractor.py          # Agent 2: Structured data extraction
|   |-- methodology_critic.py       # Agent 3: Methodology evaluation
|   |-- relevance_researcher.py     # Agent 4: Related work search
|   |-- review_synthesizer.py       # Agent 5: Draft review writer
|   |-- rubric_evaluator.py         # Agent 6: 15-criteria quality scorer
|   |-- enhancer.py                 # Agent 7: Final report polisher
|
|-- tools/                          # CrewAI tool definitions
|   |-- __init__.py
|   |-- pdf_parser.py               # PDF text extraction
|   |-- pii_detector.py             # PII detection & redaction
|   |-- injection_scanner.py        # Prompt injection detection
|   |-- url_validator.py            # URL blocklist validation
|   |-- citation_search.py          # Semantic Scholar / OpenAlex search
|
|-- schemas/                        # Pydantic output models
|   |-- __init__.py
|   |-- models.py                   # All 8 schema definitions
|
|-- test_components.py              # Component tests
|-- tests/                          # Test directory

11. How to Run

Prerequisites

  • Python 3.10+
  • OpenAI API key with GPT-4o access

Setup

# 1. Install dependencies
pip install -r requirements.txt

# 2. Create .env file
echo "OPENAI_API_KEY=your-key-here" > .env

# 3. Run the application
python app.py

The Gradio UI launches at http://0.0.0.0:7860.

Usage

  1. Open the UI in your browser
  2. Upload a research paper PDF (max 20 MB)
  3. Click "Analyze Paper"
  4. Wait 1-3 minutes for the pipeline to complete
  5. Review results across all 6 tabs
  6. Download the full report as Markdown

Generated for AI Research Paper Analyst β€” Homework 5, Agentic AI Bootcamp