Spaces:

AISA-Framework
/

AI-Research-Paper-Analyst

Sleeping

File size: 21,430 Bytes

2447eba

# AI Research Paper Analyst — Complete Project Documentation

## Table of Contents

1. [Project Overview](#1-project-overview)
2. [System Architecture Flowchart](#2-system-architecture-flowchart)
3. [Pipeline Flow](#3-pipeline-flow)
4. [Agents](#4-agents)
5. [Tools](#5-tools)
6. [Pydantic Schemas](#6-pydantic-schemas)
7. [Gradio UI](#7-gradio-ui)
8. [Safety & Guardrails](#8-safety--guardrails)
9. [Tech Stack & Dependencies](#9-tech-stack--dependencies)
10. [Project Structure](#10-project-structure)
11. [How to Run](#11-how-to-run)

---

## 1. Project Overview

**AI Research Paper Analyst** is an automated peer-review system powered by a multi-agent AI pipeline. A user uploads a research paper (PDF), and the system produces a comprehensive, publication-ready peer review — including methodology critique, novelty assessment, rubric scoring, and a final Accept/Revise/Reject recommendation.

| Property | Value |
|---|---|
| **Framework** | CrewAI (multi-agent orchestration) |
| **LLM Backend** | OpenAI GPT-4o (extraction) + GPT-4o-mini (all other agents) |
| **Frontend** | Gradio 5.x |
| **Safety** | Programmatic (regex/logic-based) — no LLM in the safety gate |
| **Output Format** | Structured JSON (Pydantic) rendered as Markdown |

---

## 2. System Architecture Flowchart

```mermaid
flowchart TD
    A["User Uploads PDF via Gradio UI"] --> B["File Validation"]
    B -->|Invalid| B_ERR["Return Error to UI"]
    B -->|Valid .pdf| C["GATE 1: Safety Guardian (Programmatic)"]

    subgraph SAFETY_GATE["Safety Gate — No LLM"]
        C --> C1["PDF Parser Tool — Extract raw text"]
        C1 --> C2["PII Detector Tool — Scan & redact PII"]
        C2 --> C3["Injection Scanner Tool — Check for prompt injections"]
        C3 --> C4["URL Validator Tool — Flag malicious URLs"]
        C4 --> C5{"is_safe?"}
    end

    C5 -->|UNSAFE| BLOCK["Block Document — Show Safety Report"]
    C5 -->|SAFE| D["Sanitized Text passed to Analysis Pipeline"]

    subgraph ANALYSIS_PIPELINE["Analysis Pipeline — CrewAI Sequential"]
        D --> E["STEP 1: Paper Extractor Agent (GPT-4o)"]
        E -->|PaperExtraction JSON| F["STEP 2a: Methodology Critic Agent (GPT-4o-mini)"]
        E -->|PaperExtraction JSON| G["STEP 2b: Relevance Researcher Agent (GPT-4o-mini)"]
        F -->|MethodologyCritique JSON| H["STEP 3: Review Synthesizer Agent (GPT-4o-mini)"]
        G -->|RelevanceReport JSON| H
        E -->|PaperExtraction JSON| H
        H -->|ReviewDraft JSON| I["STEP 4: Rubric Evaluator Agent (GPT-4o-mini)"]
        I -->|RubricEvaluation JSON| J["STEP 5: Enhancer Agent (GPT-4o-mini)"]
        H -->|ReviewDraft JSON| J
        E -->|PaperExtraction JSON| J
    end

    J -->|FinalReview JSON| K["Output Formatting"]

    subgraph OUTPUT["Gradio UI — 6 Tabs"]
        K --> K1["Executive Summary Tab"]
        K --> K2["Full Review Tab"]
        K --> K3["Rubric Scorecard Tab"]
        K --> K4["Safety Report Tab"]
        K --> K5["Agent Outputs Tab"]
        K --> K6["Pipeline Logs Tab"]
    end

    K2 --> DL["Download Full Report (.md)"]
```

### Simplified Agent Pipeline Flow

```mermaid
flowchart LR
    PDF["PDF Upload"] --> SG["Safety\nGuardian"]
    SG --> PE["Paper\nExtractor"]
    PE --> MC["Methodology\nCritic"]
    PE --> RR["Relevance\nResearcher"]
    MC --> RS["Review\nSynthesizer"]
    RR --> RS
    RS --> RE["Rubric\nEvaluator"]
    RE --> EN["Enhancer"]
    EN --> OUT["Final\nReport"]

    style SG fill:#ff6b6b,stroke:#c0392b,color:#fff
    style PE fill:#74b9ff,stroke:#2980b9,color:#fff
    style MC fill:#a29bfe,stroke:#6c5ce7,color:#fff
    style RR fill:#a29bfe,stroke:#6c5ce7,color:#fff
    style RS fill:#55efc4,stroke:#00b894,color:#fff
    style RE fill:#ffeaa7,stroke:#fdcb6e,color:#333
    style EN fill:#fd79a8,stroke:#e84393,color:#fff
```

### Data Flow Diagram

```mermaid
flowchart TD
    subgraph TOOLS["Tools Layer"]
        T1["pdf_parser_tool"]
        T2["pii_detector_tool"]
        T3["injection_scanner_tool"]
        T4["url_validator_tool"]
        T5["citation_search_tool"]
    end

    subgraph AGENTS["Agent Layer"]
        A1["Safety Guardian\n(Programmatic)"]
        A2["Paper Extractor\n(GPT-4o)"]
        A3["Methodology Critic\n(GPT-4o-mini)"]
        A4["Relevance Researcher\n(GPT-4o-mini)"]
        A5["Review Synthesizer\n(GPT-4o-mini)"]
        A6["Rubric Evaluator\n(GPT-4o-mini)"]
        A7["Enhancer\n(GPT-4o-mini)"]
    end

    subgraph SCHEMAS["Schema Layer (Pydantic)"]
        S1["SafetyReport"]
        S2["PaperExtraction"]
        S3["MethodologyCritique"]
        S4["RelevanceReport"]
        S5["ReviewDraft"]
        S6["RubricEvaluation"]
        S7["FinalReview"]
    end

    A1 -.->|uses| T1 & T2 & T3 & T4
    A2 -.->|uses| T1
    A4 -.->|uses| T5

    A1 -->|outputs| S1
    A2 -->|outputs| S2
    A3 -->|outputs| S3
    A4 -->|outputs| S4
    A5 -->|outputs| S5
    A6 -->|outputs| S6
    A7 -->|outputs| S7
```

---

## 3. Pipeline Flow

The system runs as a **sequential pipeline** with one safety gate and six analysis steps:

| Stage | Agent | LLM | Input | Output Schema | Tools Used |
|---|---|---|---|---|---|
| **Gate 1** | Safety Guardian | None (programmatic) | Raw PDF file | `SafetyReport` | pdf_parser, pii_detector, injection_scanner, url_validator |
| **Step 1** | Paper Extractor | GPT-4o | Sanitized text | `PaperExtraction` | pdf_parser |
| **Step 2a** | Methodology Critic | GPT-4o-mini | PaperExtraction JSON | `MethodologyCritique` | None |
| **Step 2b** | Relevance Researcher | GPT-4o-mini | PaperExtraction JSON | `RelevanceReport` | citation_search |
| **Step 3** | Review Synthesizer | GPT-4o-mini | Paper + Critique + Research | `ReviewDraft` | None |
| **Step 4** | Rubric Evaluator | GPT-4o-mini | Draft + Paper + Critique + Research | `RubricEvaluation` | None |
| **Step 5** | Enhancer | GPT-4o-mini | Draft + Rubric + Paper | `FinalReview` | None |

### Pipeline Error Handling

- Each agent step is wrapped in `try/except` — a failure in one agent does not crash the pipeline.
- If an agent fails, its output defaults to `{"error": "..."}` and downstream agents work with available data.
- The Safety Gate blocks the entire pipeline if `is_safe=False` (prompt injection or malicious URLs detected).
- PII is always redacted before analysis, even for "safe" documents.

---

## 4. Agents

### Agent 1: Safety Guardian (Programmatic)

| Property | Value |
|---|---|
| **File** | `app.py` — `run_safety_check()` |
| **LLM** | None — fully programmatic |
| **Purpose** | Gate that blocks unsafe documents before LLM analysis |
| **Tools** | pdf_parser, pii_detector, injection_scanner, url_validator |
| **Output** | `SafetyReport` |

Runs all 4 safety tools as Python functions directly (no CrewAI agent overhead). This is deterministic, fast (<1 second), and avoids LLM hallucinations in safety-critical decisions.

**Decision Logic:**
- `is_safe = (not injection_detected) AND (no malicious URLs)`
- Risk level: `high` if injection or malicious URLs, `medium` if PII found, `low` otherwise
- If `is_safe=False` → pipeline is blocked, user sees the Safety Report

---

### Agent 2: Paper Extractor

| Property | Value |
|---|---|
| **File** | `agents/paper_extractor.py` |
| **LLM** | GPT-4o (temperature=0.1, seed=42) |
| **Role** | Research Paper Data Extractor |
| **Tools** | pdf_parser_tool |
| **Output** | `PaperExtraction` |
| **Max Iterations** | 3 |

Extracts structured metadata from the raw paper text: title, authors, abstract, methodology, key findings, contributions, limitations, references count, paper type, and extraction confidence level.

Uses GPT-4o (not mini) because extraction requires deep comprehension of the full paper.

---

### Agent 3: Methodology Critic

| Property | Value |
|---|---|
| **File** | `agents/methodology_critic.py` |
| **LLM** | GPT-4o-mini (temperature=0.1, seed=42) |
| **Role** | Research Methodology Evaluator |
| **Tools** | None (pure LLM reasoning) |
| **Output** | `MethodologyCritique` |
| **Max Iterations** | 5 |

Critically evaluates study design, statistical methods, sample sizes, reproducibility, and logical consistency. For theoretical papers, adapts criteria to assess logical rigor and proof completeness. Produces scores for methodology (1-10) and reproducibility (1-10).

---

### Agent 4: Relevance Researcher

| Property | Value |
|---|---|
| **File** | `agents/relevance_researcher.py` |
| **LLM** | GPT-4o-mini (temperature=0.1, seed=42) |
| **Role** | Related Work Analyst |
| **Tools** | citation_search_tool |
| **Output** | `RelevanceReport` |
| **Max Iterations** | 5 |

Searches for real related papers using Semantic Scholar / OpenAlex APIs. Assesses novelty by comparing against existing work. Produces a novelty score (1-10), field context, gaps addressed, and overlaps.

**Critical Rule:** Must NOT hallucinate citations. Only uses papers found by the search tool.

---

### Agent 5: Review Synthesizer

| Property | Value |
|---|---|
| **File** | `agents/review_synthesizer.py` |
| **LLM** | GPT-4o-mini (temperature=0.1, seed=42, max_tokens=4000) |
| **Role** | Peer Review Report Writer |
| **Tools** | None (synthesis only) |
| **Output** | `ReviewDraft` |
| **Max Iterations** | 3 |

Combines insights from Paper Extractor, Methodology Critic, and Relevance Researcher into a coherent peer-review draft with summary, strengths, weaknesses, assessments, recommendation (Accept/Revise/Reject), and questions for authors.

---

### Agent 6: Rubric Evaluator

| Property | Value |
|---|---|
| **File** | `agents/rubric_evaluator.py` |
| **LLM** | GPT-4o-mini (temperature=0.1, seed=42) |
| **Role** | Objective Quality Scorer |
| **Tools** | None (evaluation logic only) |
| **Output** | `RubricEvaluation` |
| **Max Iterations** | 3 |

Scores the review draft on **15 strict binary criteria** (0 or 1 each). Pass threshold: >= 11/15.

**15 Rubric Criteria:**

| # | Category | Criterion |
|---|---|---|
| 1 | Content Completeness | Title & authors correctly identified |
| 2 | Content Completeness | Abstract accurately summarized |
| 3 | Content Completeness | Methodology clearly described |
| 4 | Content Completeness | At least 3 distinct strengths |
| 5 | Content Completeness | At least 3 distinct weaknesses |
| 6 | Content Completeness | Limitations acknowledged |
| 7 | Content Completeness | Related work present (2+ papers) |
| 8 | Analytical Depth | Novelty assessed with justification |
| 9 | Analytical Depth | Reproducibility discussed |
| 10 | Analytical Depth | Evidence quality evaluated |
| 11 | Analytical Depth | Contribution to field stated |
| 12 | Review Quality | Recommendation justified with evidence |
| 13 | Review Quality | At least 3 actionable questions |
| 14 | Review Quality | No hallucinated citations |
| 15 | Review Quality | Professional tone and coherent structure |

---

### Agent 7: Enhancer

| Property | Value |
|---|---|
| **File** | `agents/enhancer.py` |
| **LLM** | GPT-4o-mini (temperature=0.1, seed=42) |
| **Role** | Review Report Enhancer |
| **Tools** | None (writing/synthesis only) |
| **Output** | `FinalReview` |
| **Max Iterations** | 3 |

Takes the draft review + rubric feedback and produces a **complete, publication-ready** peer review report (800-1500 words). Fixes all rubric criteria that scored 0 while keeping content that passed. Produces the final executive summary, recommendation, confidence score, and improvement log.

---

## 5. Tools

### Tool 1: PDF Parser (`tools/pdf_parser.py`)

| Property | Value |
|---|---|
| **Library** | pdfplumber |
| **Assigned To** | Safety Guardian, Paper Extractor |
| **Input** | File path (string) |
| **Output** | Extracted text (string) or `"ERROR: ..."` |

**Guardrails:**
- File must be `.pdf`
- File must exist on disk
- File size max: 20 MB
- Minimum extractable text: 100 chars
- Never raises exceptions — returns error strings

---

### Tool 2: PII Detector (`tools/pii_detector.py`)

| Property | Value |
|---|---|
| **Approach** | Regex pattern matching |
| **Assigned To** | Safety Guardian |
| **Input** | Text to scan |
| **Output** | JSON with `findings`, `redacted_text`, `pii_count` |

**Patterns Detected:**
- Email addresses
- Phone numbers (US format)
- Social Security Numbers
- Credit card numbers

All matches are replaced with `[REDACTED_TYPE]` tokens.

---

### Tool 3: Prompt Injection Scanner (`tools/injection_scanner.py`)

| Property | Value |
|---|---|
| **Approach** | Regex pattern matching (9 patterns) |
| **Assigned To** | Safety Guardian |
| **Input** | Text to scan |
| **Output** | JSON with `is_safe`, `suspicious_patterns`, `patterns_checked` |

**Patterns Checked:**
- "ignore previous instructions"
- "disregard above/previous"
- "forget everything/all/your instructions"
- "new instructions:"
- `[INST]` token
- `<|im_start|>` token
- `<|system|>` token
- "override safety"
- "jailbreak"

**Fail-safe:** If scanning itself fails, the document is treated as **unsafe**.

---

### Tool 4: URL Validator (`tools/url_validator.py`)

| Property | Value |
|---|---|
| **Approach** | Regex extraction + blocklist matching |
| **Assigned To** | Safety Guardian |
| **Input** | Text to scan |
| **Output** | JSON with `total_urls`, `malicious_urls`, `is_safe` |

**Suspicious Indicators:**
- URL shorteners: bit.ly, tinyurl, t.co, goo.gl
- Dangerous protocols: data:, javascript:, file://
- Keywords: malware, phishing

Max 50 URLs checked per scan.

---

### Tool 5: Citation Search (`tools/citation_search.py`)

| Property | Value |
|---|---|
| **Primary API** | Semantic Scholar (with retry for HTTP 429) |
| **Fallback API** | OpenAlex (free, no rate limits) |
| **Assigned To** | Relevance Researcher |
| **Input** | Search query (string, max 200 chars) |
| **Output** | Formatted text list of papers with title, authors, year, citations, abstract |

**Rate Limiting:**
- Max 3 API calls per analysis run (tracked globally)
- 10-second timeout per API call
- Exponential backoff for rate limits (1s, 2s, 4s)

**Fallback Chain:** Semantic Scholar -> OpenAlex -> "Search unavailable" message

---

## 6. Pydantic Schemas

All schemas inherit from `BaseAgentOutput` which enforces `extra="ignore"` for Gradio compatibility.

**File:** `schemas/models.py`

### SafetyReport
```
is_safe: bool (default=False, fail-safe)
pii_found: list[str]
injection_detected: bool
malicious_urls: list[str]
sanitized_text: str
risk_level: "low" | "medium" | "high"
```

### PaperExtraction
```
title: str
authors: list[str]
abstract: str
methodology: str
key_findings: list[str]
contributions: list[str]
limitations_stated: list[str]
references_count: int
paper_type: "empirical" | "theoretical" | "survey" | "system" | "mixed"
extraction_confidence: "high" | "medium" | "low"
```

### MethodologyCritique
```
strengths: list[str]
weaknesses: list[str]
limitations: list[str]
methodology_score: int (1-10)
reproducibility_score: int (1-10)
suggestions: list[str]
bias_risks: list[str]
```

### RelevanceReport
```
related_papers: list[RelatedPaper]
novelty_score: int (1-10)
field_context: str
gaps_addressed: list[str]
overlaps_with_existing: list[str]
```

### RelatedPaper
```
title: str
authors: str
year: int
citation_count: int
relevance: str
```

### ReviewDraft
```
summary: str
strengths_section: str
weaknesses_section: str
methodology_assessment: str
novelty_assessment: str
related_work_context: str
questions_for_authors: list[str]
recommendation: "Accept" | "Revise" | "Reject"
confidence: int (1-5)
detailed_review: str
```

### RubricEvaluation
```
scores: dict[str, int]        (15 criteria, each 0 or 1)
total_score: int              (0-15)
failed_criteria: list[str]
feedback_per_criterion: dict[str, str]
passed: bool                  (True if total_score >= 11)
```

### FinalReview
```
executive_summary: str
paper_metadata: dict
strengths: list[str]
weaknesses: list[str]
methodology_assessment: str
novelty_assessment: str
related_work_context: str
questions_for_authors: list[str]
recommendation: "Accept" | "Revise" | "Reject"
confidence_score: int (1-5)
rubric_scores: dict[str, int]
rubric_total: int
improvement_log: list[str]
```

---

## 7. Gradio UI

The UI is a single-page Gradio Blocks application with **6 tabs**:

| Tab | Content | Component |
|---|---|---|
| Executive Summary | Recommendation, confidence, rubric score, paper info | Markdown + Download button |
| Full Review | Strengths, weaknesses, methodology, novelty, questions | Markdown |
| Rubric Scorecard | 15 criteria scores in 3 categories with feedback | Markdown (table) |
| Safety Report | PII findings, injection status, URL analysis | Markdown |
| Agent Outputs | Raw structured output from each of the 7 agents | Markdown |
| Pipeline Logs | Timestamped execution log + JSON summary | Textbox + Code |

### UI Features
- **Progress bar** with real-time status updates (e.g., "Agent 3/6: Searching Related Work...")
- **Download button** to export the full review as a `.md` file
- **File validation** — only accepts `.pdf` files

---

## 8. Safety & Guardrails

### Layered Safety Architecture

```mermaid
flowchart TD
    subgraph LAYER1["Layer 1: Input Validation"]
        IV1["File type check (.pdf only)"]
        IV2["File size check (max 20MB)"]
        IV3["Minimum text check (100+ chars)"]
    end

    subgraph LAYER2["Layer 2: Content Safety"]
        CS1["PII Detection & Redaction"]
        CS2["Prompt Injection Scanning"]
        CS3["URL Blocklist Validation"]
    end

    subgraph LAYER3["Layer 3: LLM Configuration"]
        LC1["Low temperature (0.1)"]
        LC2["Deterministic seed (42)"]
        LC3["Max iterations per agent"]
        LC4["Structured output (Pydantic)"]
    end

    subgraph LAYER4["Layer 4: Pipeline Resilience"]
        PR1["Per-agent try/except"]
        PR2["Graceful degradation"]
        PR3["API rate limiting (3 calls max)"]
        PR4["Timeout enforcement (10s)"]
    end

    subgraph LAYER5["Layer 5: Observability"]
        OB1["PipelineLogger — every step logged"]
        OB2["API key redaction in logs"]
        OB3["Execution summary with timing"]
    end

    LAYER1 --> LAYER2 --> LAYER3 --> LAYER4 --> LAYER5
```

### Key Principles
- **Fail-safe defaults**: `is_safe=False`, risk defaults to unsafe
- **No LLM in the safety gate**: All safety checks are deterministic regex/logic
- **PII always redacted**: Even for safe documents, PII is stripped before LLM analysis
- **Structured outputs**: Every agent uses Pydantic schemas enforced by CrewAI
- **No secrets in logs**: API keys are regex-redacted from all log output

---

## 9. Tech Stack & Dependencies

| Package | Version | Purpose |
|---|---|---|
| `crewai` | >= 0.86.0 | Multi-agent orchestration framework |
| `crewai-tools` | >= 0.17.0 | Tool wrapper utilities |
| `openai` | >= 1.0.0 | LLM API client (GPT-4o, GPT-4o-mini) |
| `pdfplumber` | >= 0.11.0 | PDF text extraction |
| `pydantic` | >= 2.0.0 | Structured output validation |
| `gradio` | >= 5.0.0 | Web UI framework |
| `python-dotenv` | >= 1.0.0 | Environment variable loading |
| `requests` | >= 2.31.0 | HTTP client for citation APIs |

### Environment Variables

| Variable | Required | Purpose |
|---|---|---|
| `OPENAI_API_KEY` | Yes | OpenAI API access (GPT-4o required) |

---

## 10. Project Structure

```
Homework5_agentincAI/
|-- app.py                          # Main application (pipeline + Gradio UI)
|-- requirements.txt                # Python dependencies
|-- README.md                       # HuggingFace Space metadata
|-- .env                            # Environment variables (API keys)
|-- .gitignore
|
|-- agents/                         # CrewAI agent definitions
|   |-- __init__.py
|   |-- paper_extractor.py          # Agent 2: Structured data extraction
|   |-- methodology_critic.py       # Agent 3: Methodology evaluation
|   |-- relevance_researcher.py     # Agent 4: Related work search
|   |-- review_synthesizer.py       # Agent 5: Draft review writer
|   |-- rubric_evaluator.py         # Agent 6: 15-criteria quality scorer
|   |-- enhancer.py                 # Agent 7: Final report polisher
|
|-- tools/                          # CrewAI tool definitions
|   |-- __init__.py
|   |-- pdf_parser.py               # PDF text extraction
|   |-- pii_detector.py             # PII detection & redaction
|   |-- injection_scanner.py        # Prompt injection detection
|   |-- url_validator.py            # URL blocklist validation
|   |-- citation_search.py          # Semantic Scholar / OpenAlex search
|
|-- schemas/                        # Pydantic output models
|   |-- __init__.py
|   |-- models.py                   # All 8 schema definitions
|
|-- test_components.py              # Component tests
|-- tests/                          # Test directory
```

---

## 11. How to Run

### Prerequisites
- Python 3.10+
- OpenAI API key with GPT-4o access

### Setup

```bash
# 1. Install dependencies
pip install -r requirements.txt

# 2. Create .env file
echo "OPENAI_API_KEY=your-key-here" > .env

# 3. Run the application
python app.py
```

The Gradio UI launches at `http://0.0.0.0:7860`.

### Usage
1. Open the UI in your browser
2. Upload a research paper PDF (max 20 MB)
3. Click "Analyze Paper"
4. Wait 1-3 minutes for the pipeline to complete
5. Review results across all 6 tabs
6. Download the full report as Markdown

---

*Generated for AI Research Paper Analyst — Homework 5, Agentic AI Bootcamp*