Spaces:

AISA-Framework
/

AI-Research-Paper-Analyst

Sleeping

App Files Files Community

AI-Research-Paper-Analyst / PROJECT_DOCUMENTATION.md

Saleh

Clean deployment to HuggingFace Space

2447eba 20 days ago

preview code

raw

history blame contribute delete

21.4 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

AI Research Paper Analyst — Complete Project Documentation

Project Overview
System Architecture Flowchart
Pipeline Flow
Agents
Tools
Pydantic Schemas
Gradio UI
Safety & Guardrails
Tech Stack & Dependencies
Project Structure
How to Run

1. Project Overview

AI Research Paper Analyst is an automated peer-review system powered by a multi-agent AI pipeline. A user uploads a research paper (PDF), and the system produces a comprehensive, publication-ready peer review — including methodology critique, novelty assessment, rubric scoring, and a final Accept/Revise/Reject recommendation.

Property	Value
Framework	CrewAI (multi-agent orchestration)
LLM Backend	OpenAI GPT-4o (extraction) + GPT-4o-mini (all other agents)
Frontend	Gradio 5.x
Safety	Programmatic (regex/logic-based) — no LLM in the safety gate
Output Format	Structured JSON (Pydantic) rendered as Markdown

2. System Architecture Flowchart

flowchart TD
    A["User Uploads PDF via Gradio UI"] --> B["File Validation"]
    B -->|Invalid| B_ERR["Return Error to UI"]
    B -->|Valid .pdf| C["GATE 1: Safety Guardian (Programmatic)"]

    subgraph SAFETY_GATE["Safety Gate — No LLM"]
        C --> C1["PDF Parser Tool — Extract raw text"]
        C1 --> C2["PII Detector Tool — Scan & redact PII"]
        C2 --> C3["Injection Scanner Tool — Check for prompt injections"]
        C3 --> C4["URL Validator Tool — Flag malicious URLs"]
        C4 --> C5{"is_safe?"}
    end

    C5 -->|UNSAFE| BLOCK["Block Document — Show Safety Report"]
    C5 -->|SAFE| D["Sanitized Text passed to Analysis Pipeline"]

    subgraph ANALYSIS_PIPELINE["Analysis Pipeline — CrewAI Sequential"]
        D --> E["STEP 1: Paper Extractor Agent (GPT-4o)"]
        E -->|PaperExtraction JSON| F["STEP 2a: Methodology Critic Agent (GPT-4o-mini)"]
        E -->|PaperExtraction JSON| G["STEP 2b: Relevance Researcher Agent (GPT-4o-mini)"]
        F -->|MethodologyCritique JSON| H["STEP 3: Review Synthesizer Agent (GPT-4o-mini)"]
        G -->|RelevanceReport JSON| H
        E -->|PaperExtraction JSON| H
        H -->|ReviewDraft JSON| I["STEP 4: Rubric Evaluator Agent (GPT-4o-mini)"]
        I -->|RubricEvaluation JSON| J["STEP 5: Enhancer Agent (GPT-4o-mini)"]
        H -->|ReviewDraft JSON| J
        E -->|PaperExtraction JSON| J
    end

    J -->|FinalReview JSON| K["Output Formatting"]

    subgraph OUTPUT["Gradio UI — 6 Tabs"]
        K --> K1["Executive Summary Tab"]
        K --> K2["Full Review Tab"]
        K --> K3["Rubric Scorecard Tab"]
        K --> K4["Safety Report Tab"]
        K --> K5["Agent Outputs Tab"]
        K --> K6["Pipeline Logs Tab"]
    end

    K2 --> DL["Download Full Report (.md)"]

Simplified Agent Pipeline Flow

flowchart LR
    PDF["PDF Upload"] --> SG["Safety\nGuardian"]
    SG --> PE["Paper\nExtractor"]
    PE --> MC["Methodology\nCritic"]
    PE --> RR["Relevance\nResearcher"]
    MC --> RS["Review\nSynthesizer"]
    RR --> RS
    RS --> RE["Rubric\nEvaluator"]
    RE --> EN["Enhancer"]
    EN --> OUT["Final\nReport"]

    style SG fill:#ff6b6b,stroke:#c0392b,color:#fff
    style PE fill:#74b9ff,stroke:#2980b9,color:#fff
    style MC fill:#a29bfe,stroke:#6c5ce7,color:#fff
    style RR fill:#a29bfe,stroke:#6c5ce7,color:#fff
    style RS fill:#55efc4,stroke:#00b894,color:#fff
    style RE fill:#ffeaa7,stroke:#fdcb6e,color:#333
    style EN fill:#fd79a8,stroke:#e84393,color:#fff

Data Flow Diagram

flowchart TD
    subgraph TOOLS["Tools Layer"]
        T1["pdf_parser_tool"]
        T2["pii_detector_tool"]
        T3["injection_scanner_tool"]
        T4["url_validator_tool"]
        T5["citation_search_tool"]
    end

    subgraph AGENTS["Agent Layer"]
        A1["Safety Guardian\n(Programmatic)"]
        A2["Paper Extractor\n(GPT-4o)"]
        A3["Methodology Critic\n(GPT-4o-mini)"]
        A4["Relevance Researcher\n(GPT-4o-mini)"]
        A5["Review Synthesizer\n(GPT-4o-mini)"]
        A6["Rubric Evaluator\n(GPT-4o-mini)"]
        A7["Enhancer\n(GPT-4o-mini)"]
    end

    subgraph SCHEMAS["Schema Layer (Pydantic)"]
        S1["SafetyReport"]
        S2["PaperExtraction"]
        S3["MethodologyCritique"]
        S4["RelevanceReport"]
        S5["ReviewDraft"]
        S6["RubricEvaluation"]
        S7["FinalReview"]
    end

    A1 -.->|uses| T1 & T2 & T3 & T4
    A2 -.->|uses| T1
    A4 -.->|uses| T5

    A1 -->|outputs| S1
    A2 -->|outputs| S2
    A3 -->|outputs| S3
    A4 -->|outputs| S4
    A5 -->|outputs| S5
    A6 -->|outputs| S6
    A7 -->|outputs| S7

3. Pipeline Flow

The system runs as a sequential pipeline with one safety gate and six analysis steps:

Stage	Agent	LLM	Input	Output Schema	Tools Used
Gate 1	Safety Guardian	None (programmatic)	Raw PDF file	`SafetyReport`	pdf_parser, pii_detector, injection_scanner, url_validator
Step 1	Paper Extractor	GPT-4o	Sanitized text	`PaperExtraction`	pdf_parser
Step 2a	Methodology Critic	GPT-4o-mini	PaperExtraction JSON	`MethodologyCritique`	None
Step 2b	Relevance Researcher	GPT-4o-mini	PaperExtraction JSON	`RelevanceReport`	citation_search
Step 3	Review Synthesizer	GPT-4o-mini	Paper + Critique + Research	`ReviewDraft`	None
Step 4	Rubric Evaluator	GPT-4o-mini	Draft + Paper + Critique + Research	`RubricEvaluation`	None
Step 5	Enhancer	GPT-4o-mini	Draft + Rubric + Paper	`FinalReview`	None

Pipeline Error Handling

Each agent step is wrapped in try/except — a failure in one agent does not crash the pipeline.
If an agent fails, its output defaults to {"error": "..."} and downstream agents work with available data.
The Safety Gate blocks the entire pipeline if is_safe=False (prompt injection or malicious URLs detected).
PII is always redacted before analysis, even for "safe" documents.

4. Agents

Agent 1: Safety Guardian (Programmatic)

Property	Value
File	`app.py` — `run_safety_check()`
LLM	None — fully programmatic
Purpose	Gate that blocks unsafe documents before LLM analysis
Tools	pdf_parser, pii_detector, injection_scanner, url_validator
Output	`SafetyReport`

Runs all 4 safety tools as Python functions directly (no CrewAI agent overhead). This is deterministic, fast (<1 second), and avoids LLM hallucinations in safety-critical decisions.

Decision Logic:

is_safe = (not injection_detected) AND (no malicious URLs)
Risk level: high if injection or malicious URLs, medium if PII found, low otherwise
If is_safe=False → pipeline is blocked, user sees the Safety Report

Agent 2: Paper Extractor

Property	Value
File	`agents/paper_extractor.py`
LLM	GPT-4o (temperature=0.1, seed=42)
Role	Research Paper Data Extractor
Tools	pdf_parser_tool
Output	`PaperExtraction`
Max Iterations	3

Extracts structured metadata from the raw paper text: title, authors, abstract, methodology, key findings, contributions, limitations, references count, paper type, and extraction confidence level.

Uses GPT-4o (not mini) because extraction requires deep comprehension of the full paper.

Agent 3: Methodology Critic

Property	Value
File	`agents/methodology_critic.py`
LLM	GPT-4o-mini (temperature=0.1, seed=42)
Role	Research Methodology Evaluator
Tools	None (pure LLM reasoning)
Output	`MethodologyCritique`
Max Iterations	5

Critically evaluates study design, statistical methods, sample sizes, reproducibility, and logical consistency. For theoretical papers, adapts criteria to assess logical rigor and proof completeness. Produces scores for methodology (1-10) and reproducibility (1-10).

Agent 4: Relevance Researcher

Property	Value
File	`agents/relevance_researcher.py`
LLM	GPT-4o-mini (temperature=0.1, seed=42)
Role	Related Work Analyst
Tools	citation_search_tool
Output	`RelevanceReport`
Max Iterations	5

Searches for real related papers using Semantic Scholar / OpenAlex APIs. Assesses novelty by comparing against existing work. Produces a novelty score (1-10), field context, gaps addressed, and overlaps.

Critical Rule: Must NOT hallucinate citations. Only uses papers found by the search tool.

Agent 5: Review Synthesizer

Property	Value
File	`agents/review_synthesizer.py`
LLM	GPT-4o-mini (temperature=0.1, seed=42, max_tokens=4000)
Role	Peer Review Report Writer
Tools	None (synthesis only)
Output	`ReviewDraft`
Max Iterations	3

Combines insights from Paper Extractor, Methodology Critic, and Relevance Researcher into a coherent peer-review draft with summary, strengths, weaknesses, assessments, recommendation (Accept/Revise/Reject), and questions for authors.

Agent 6: Rubric Evaluator

Property	Value
File	`agents/rubric_evaluator.py`
LLM	GPT-4o-mini (temperature=0.1, seed=42)
Role	Objective Quality Scorer
Tools	None (evaluation logic only)
Output	`RubricEvaluation`
Max Iterations	3

Scores the review draft on 15 strict binary criteria (0 or 1 each). Pass threshold: >= 11/15.

15 Rubric Criteria:

#	Category	Criterion
1	Content Completeness	Title & authors correctly identified
2	Content Completeness	Abstract accurately summarized
3	Content Completeness	Methodology clearly described
4	Content Completeness	At least 3 distinct strengths
5	Content Completeness	At least 3 distinct weaknesses
6	Content Completeness	Limitations acknowledged
7	Content Completeness	Related work present (2+ papers)
8	Analytical Depth	Novelty assessed with justification
9	Analytical Depth	Reproducibility discussed
10	Analytical Depth	Evidence quality evaluated
11	Analytical Depth	Contribution to field stated
12	Review Quality	Recommendation justified with evidence
13	Review Quality	At least 3 actionable questions
14	Review Quality	No hallucinated citations
15	Review Quality	Professional tone and coherent structure

Agent 7: Enhancer

Property	Value
File	`agents/enhancer.py`
LLM	GPT-4o-mini (temperature=0.1, seed=42)
Role	Review Report Enhancer
Tools	None (writing/synthesis only)
Output	`FinalReview`
Max Iterations	3

Takes the draft review + rubric feedback and produces a complete, publication-ready peer review report (800-1500 words). Fixes all rubric criteria that scored 0 while keeping content that passed. Produces the final executive summary, recommendation, confidence score, and improvement log.

5. Tools

Tool 1: PDF Parser (`tools/pdf_parser.py`)

Property	Value
Library	pdfplumber
Assigned To	Safety Guardian, Paper Extractor
Input	File path (string)
Output	Extracted text (string) or `"ERROR: ..."`

Guardrails:

File must be .pdf
File must exist on disk
File size max: 20 MB
Minimum extractable text: 100 chars
Never raises exceptions — returns error strings

Tool 2: PII Detector (`tools/pii_detector.py`)

Property	Value
Approach	Regex pattern matching
Assigned To	Safety Guardian
Input	Text to scan
Output	JSON with `findings`, `redacted_text`, `pii_count`

Patterns Detected:

Email addresses
Phone numbers (US format)
Social Security Numbers
Credit card numbers

All matches are replaced with [REDACTED_TYPE] tokens.

Tool 3: Prompt Injection Scanner (`tools/injection_scanner.py`)

Property	Value
Approach	Regex pattern matching (9 patterns)
Assigned To	Safety Guardian
Input	Text to scan
Output	JSON with `is_safe`, `suspicious_patterns`, `patterns_checked`

Patterns Checked:

"ignore previous instructions"
"disregard above/previous"
"forget everything/all/your instructions"
"new instructions:"
[INST] token
<|im_start|> token
<|system|> token
"override safety"
"jailbreak"

Fail-safe: If scanning itself fails, the document is treated as unsafe.

Tool 4: URL Validator (`tools/url_validator.py`)

Property	Value
Approach	Regex extraction + blocklist matching
Assigned To	Safety Guardian
Input	Text to scan
Output	JSON with `total_urls`, `malicious_urls`, `is_safe`

Suspicious Indicators:

URL shorteners: bit.ly, tinyurl, t.co, goo.gl
Dangerous protocols: data:, javascript:, file://
Keywords: malware, phishing

Max 50 URLs checked per scan.

Tool 5: Citation Search (`tools/citation_search.py`)

Property	Value
Primary API	Semantic Scholar (with retry for HTTP 429)
Fallback API	OpenAlex (free, no rate limits)
Assigned To	Relevance Researcher
Input	Search query (string, max 200 chars)
Output	Formatted text list of papers with title, authors, year, citations, abstract

Rate Limiting:

Max 3 API calls per analysis run (tracked globally)
10-second timeout per API call
Exponential backoff for rate limits (1s, 2s, 4s)

Fallback Chain: Semantic Scholar -> OpenAlex -> "Search unavailable" message

6. Pydantic Schemas

All schemas inherit from BaseAgentOutput which enforces extra="ignore" for Gradio compatibility.

File: schemas/models.py

SafetyReport

is_safe: bool (default=False, fail-safe)
pii_found: list[str]
injection_detected: bool
malicious_urls: list[str]
sanitized_text: str
risk_level: "low" | "medium" | "high"

PaperExtraction

title: str
authors: list[str]
abstract: str
methodology: str
key_findings: list[str]
contributions: list[str]
limitations_stated: list[str]
references_count: int
paper_type: "empirical" | "theoretical" | "survey" | "system" | "mixed"
extraction_confidence: "high" | "medium" | "low"

MethodologyCritique

strengths: list[str]
weaknesses: list[str]
limitations: list[str]
methodology_score: int (1-10)
reproducibility_score: int (1-10)
suggestions: list[str]
bias_risks: list[str]

RelevanceReport

related_papers: list[RelatedPaper]
novelty_score: int (1-10)
field_context: str
gaps_addressed: list[str]
overlaps_with_existing: list[str]

RelatedPaper

title: str
authors: str
year: int
citation_count: int
relevance: str

ReviewDraft

summary: str
strengths_section: str
weaknesses_section: str
methodology_assessment: str
novelty_assessment: str
related_work_context: str
questions_for_authors: list[str]
recommendation: "Accept" | "Revise" | "Reject"
confidence: int (1-5)
detailed_review: str

RubricEvaluation

scores: dict[str, int]        (15 criteria, each 0 or 1)
total_score: int              (0-15)
failed_criteria: list[str]
feedback_per_criterion: dict[str, str]
passed: bool                  (True if total_score >= 11)

FinalReview

executive_summary: str
paper_metadata: dict
strengths: list[str]
weaknesses: list[str]
methodology_assessment: str
novelty_assessment: str
related_work_context: str
questions_for_authors: list[str]
recommendation: "Accept" | "Revise" | "Reject"
confidence_score: int (1-5)
rubric_scores: dict[str, int]
rubric_total: int
improvement_log: list[str]

7. Gradio UI

The UI is a single-page Gradio Blocks application with 6 tabs:

Tab	Content	Component
Executive Summary	Recommendation, confidence, rubric score, paper info	Markdown + Download button
Full Review	Strengths, weaknesses, methodology, novelty, questions	Markdown
Rubric Scorecard	15 criteria scores in 3 categories with feedback	Markdown (table)
Safety Report	PII findings, injection status, URL analysis	Markdown
Agent Outputs	Raw structured output from each of the 7 agents	Markdown
Pipeline Logs	Timestamped execution log + JSON summary	Textbox + Code

UI Features

Progress bar with real-time status updates (e.g., "Agent 3/6: Searching Related Work...")
Download button to export the full review as a .md file
File validation — only accepts .pdf files

8. Safety & Guardrails

Layered Safety Architecture

flowchart TD
    subgraph LAYER1["Layer 1: Input Validation"]
        IV1["File type check (.pdf only)"]
        IV2["File size check (max 20MB)"]
        IV3["Minimum text check (100+ chars)"]
    end

    subgraph LAYER2["Layer 2: Content Safety"]
        CS1["PII Detection & Redaction"]
        CS2["Prompt Injection Scanning"]
        CS3["URL Blocklist Validation"]
    end

    subgraph LAYER3["Layer 3: LLM Configuration"]
        LC1["Low temperature (0.1)"]
        LC2["Deterministic seed (42)"]
        LC3["Max iterations per agent"]
        LC4["Structured output (Pydantic)"]
    end

    subgraph LAYER4["Layer 4: Pipeline Resilience"]
        PR1["Per-agent try/except"]
        PR2["Graceful degradation"]
        PR3["API rate limiting (3 calls max)"]
        PR4["Timeout enforcement (10s)"]
    end

    subgraph LAYER5["Layer 5: Observability"]
        OB1["PipelineLogger — every step logged"]
        OB2["API key redaction in logs"]
        OB3["Execution summary with timing"]
    end

    LAYER1 --> LAYER2 --> LAYER3 --> LAYER4 --> LAYER5

Key Principles

Fail-safe defaults: is_safe=False, risk defaults to unsafe
No LLM in the safety gate: All safety checks are deterministic regex/logic
PII always redacted: Even for safe documents, PII is stripped before LLM analysis
Structured outputs: Every agent uses Pydantic schemas enforced by CrewAI
No secrets in logs: API keys are regex-redacted from all log output

9. Tech Stack & Dependencies

Package	Version	Purpose
`crewai`	>= 0.86.0	Multi-agent orchestration framework
`crewai-tools`	>= 0.17.0	Tool wrapper utilities
`openai`	>= 1.0.0	LLM API client (GPT-4o, GPT-4o-mini)
`pdfplumber`	>= 0.11.0	PDF text extraction
`pydantic`	>= 2.0.0	Structured output validation
`gradio`	>= 5.0.0	Web UI framework
`python-dotenv`	>= 1.0.0	Environment variable loading
`requests`	>= 2.31.0	HTTP client for citation APIs

Environment Variables

Variable	Required	Purpose
`OPENAI_API_KEY`	Yes	OpenAI API access (GPT-4o required)

10. Project Structure

Homework5_agentincAI/
|-- app.py                          # Main application (pipeline + Gradio UI)
|-- requirements.txt                # Python dependencies
|-- README.md                       # HuggingFace Space metadata
|-- .env                            # Environment variables (API keys)
|-- .gitignore
|
|-- agents/                         # CrewAI agent definitions
|   |-- __init__.py
|   |-- paper_extractor.py          # Agent 2: Structured data extraction
|   |-- methodology_critic.py       # Agent 3: Methodology evaluation
|   |-- relevance_researcher.py     # Agent 4: Related work search
|   |-- review_synthesizer.py       # Agent 5: Draft review writer
|   |-- rubric_evaluator.py         # Agent 6: 15-criteria quality scorer
|   |-- enhancer.py                 # Agent 7: Final report polisher
|
|-- tools/                          # CrewAI tool definitions
|   |-- __init__.py
|   |-- pdf_parser.py               # PDF text extraction
|   |-- pii_detector.py             # PII detection & redaction
|   |-- injection_scanner.py        # Prompt injection detection
|   |-- url_validator.py            # URL blocklist validation
|   |-- citation_search.py          # Semantic Scholar / OpenAlex search
|
|-- schemas/                        # Pydantic output models
|   |-- __init__.py
|   |-- models.py                   # All 8 schema definitions
|
|-- test_components.py              # Component tests
|-- tests/                          # Test directory

11. How to Run

Prerequisites

Python 3.10+
OpenAI API key with GPT-4o access

Setup

# 1. Install dependencies
pip install -r requirements.txt

# 2. Create .env file
echo "OPENAI_API_KEY=your-key-here" > .env

# 3. Run the application
python app.py

The Gradio UI launches at http://0.0.0.0:7860.

Usage

Open the UI in your browser
Upload a research paper PDF (max 20 MB)
Click "Analyze Paper"
Wait 1-3 minutes for the pipeline to complete
Review results across all 6 tabs
Download the full report as Markdown

Generated for AI Research Paper Analyst — Homework 5, Agentic AI Bootcamp