# AI Research Paper Analyst — Complete Project Documentation ## Table of Contents 1. [Project Overview](#1-project-overview) 2. [System Architecture Flowchart](#2-system-architecture-flowchart) 3. [Pipeline Flow](#3-pipeline-flow) 4. [Agents](#4-agents) 5. [Tools](#5-tools) 6. [Pydantic Schemas](#6-pydantic-schemas) 7. [Gradio UI](#7-gradio-ui) 8. [Safety & Guardrails](#8-safety--guardrails) 9. [Tech Stack & Dependencies](#9-tech-stack--dependencies) 10. [Project Structure](#10-project-structure) 11. [How to Run](#11-how-to-run) --- ## 1. Project Overview **AI Research Paper Analyst** is an automated peer-review system powered by a multi-agent AI pipeline. A user uploads a research paper (PDF), and the system produces a comprehensive, publication-ready peer review — including methodology critique, novelty assessment, rubric scoring, and a final Accept/Revise/Reject recommendation. | Property | Value | |---|---| | **Framework** | CrewAI (multi-agent orchestration) | | **LLM Backend** | OpenAI GPT-4o (extraction) + GPT-4o-mini (all other agents) | | **Frontend** | Gradio 5.x | | **Safety** | Programmatic (regex/logic-based) — no LLM in the safety gate | | **Output Format** | Structured JSON (Pydantic) rendered as Markdown | --- ## 2. System Architecture Flowchart ```mermaid flowchart TD A["User Uploads PDF via Gradio UI"] --> B["File Validation"] B -->|Invalid| B_ERR["Return Error to UI"] B -->|Valid .pdf| C["GATE 1: Safety Guardian (Programmatic)"] subgraph SAFETY_GATE["Safety Gate — No LLM"] C --> C1["PDF Parser Tool — Extract raw text"] C1 --> C2["PII Detector Tool — Scan & redact PII"] C2 --> C3["Injection Scanner Tool — Check for prompt injections"] C3 --> C4["URL Validator Tool — Flag malicious URLs"] C4 --> C5{"is_safe?"} end C5 -->|UNSAFE| BLOCK["Block Document — Show Safety Report"] C5 -->|SAFE| D["Sanitized Text passed to Analysis Pipeline"] subgraph ANALYSIS_PIPELINE["Analysis Pipeline — CrewAI Sequential"] D --> E["STEP 1: Paper Extractor Agent (GPT-4o)"] E -->|PaperExtraction JSON| F["STEP 2a: Methodology Critic Agent (GPT-4o-mini)"] E -->|PaperExtraction JSON| G["STEP 2b: Relevance Researcher Agent (GPT-4o-mini)"] F -->|MethodologyCritique JSON| H["STEP 3: Review Synthesizer Agent (GPT-4o-mini)"] G -->|RelevanceReport JSON| H E -->|PaperExtraction JSON| H H -->|ReviewDraft JSON| I["STEP 4: Rubric Evaluator Agent (GPT-4o-mini)"] I -->|RubricEvaluation JSON| J["STEP 5: Enhancer Agent (GPT-4o-mini)"] H -->|ReviewDraft JSON| J E -->|PaperExtraction JSON| J end J -->|FinalReview JSON| K["Output Formatting"] subgraph OUTPUT["Gradio UI — 6 Tabs"] K --> K1["Executive Summary Tab"] K --> K2["Full Review Tab"] K --> K3["Rubric Scorecard Tab"] K --> K4["Safety Report Tab"] K --> K5["Agent Outputs Tab"] K --> K6["Pipeline Logs Tab"] end K2 --> DL["Download Full Report (.md)"] ``` ### Simplified Agent Pipeline Flow ```mermaid flowchart LR PDF["PDF Upload"] --> SG["Safety\nGuardian"] SG --> PE["Paper\nExtractor"] PE --> MC["Methodology\nCritic"] PE --> RR["Relevance\nResearcher"] MC --> RS["Review\nSynthesizer"] RR --> RS RS --> RE["Rubric\nEvaluator"] RE --> EN["Enhancer"] EN --> OUT["Final\nReport"] style SG fill:#ff6b6b,stroke:#c0392b,color:#fff style PE fill:#74b9ff,stroke:#2980b9,color:#fff style MC fill:#a29bfe,stroke:#6c5ce7,color:#fff style RR fill:#a29bfe,stroke:#6c5ce7,color:#fff style RS fill:#55efc4,stroke:#00b894,color:#fff style RE fill:#ffeaa7,stroke:#fdcb6e,color:#333 style EN fill:#fd79a8,stroke:#e84393,color:#fff ``` ### Data Flow Diagram ```mermaid flowchart TD subgraph TOOLS["Tools Layer"] T1["pdf_parser_tool"] T2["pii_detector_tool"] T3["injection_scanner_tool"] T4["url_validator_tool"] T5["citation_search_tool"] end subgraph AGENTS["Agent Layer"] A1["Safety Guardian\n(Programmatic)"] A2["Paper Extractor\n(GPT-4o)"] A3["Methodology Critic\n(GPT-4o-mini)"] A4["Relevance Researcher\n(GPT-4o-mini)"] A5["Review Synthesizer\n(GPT-4o-mini)"] A6["Rubric Evaluator\n(GPT-4o-mini)"] A7["Enhancer\n(GPT-4o-mini)"] end subgraph SCHEMAS["Schema Layer (Pydantic)"] S1["SafetyReport"] S2["PaperExtraction"] S3["MethodologyCritique"] S4["RelevanceReport"] S5["ReviewDraft"] S6["RubricEvaluation"] S7["FinalReview"] end A1 -.->|uses| T1 & T2 & T3 & T4 A2 -.->|uses| T1 A4 -.->|uses| T5 A1 -->|outputs| S1 A2 -->|outputs| S2 A3 -->|outputs| S3 A4 -->|outputs| S4 A5 -->|outputs| S5 A6 -->|outputs| S6 A7 -->|outputs| S7 ``` --- ## 3. Pipeline Flow The system runs as a **sequential pipeline** with one safety gate and six analysis steps: | Stage | Agent | LLM | Input | Output Schema | Tools Used | |---|---|---|---|---|---| | **Gate 1** | Safety Guardian | None (programmatic) | Raw PDF file | `SafetyReport` | pdf_parser, pii_detector, injection_scanner, url_validator | | **Step 1** | Paper Extractor | GPT-4o | Sanitized text | `PaperExtraction` | pdf_parser | | **Step 2a** | Methodology Critic | GPT-4o-mini | PaperExtraction JSON | `MethodologyCritique` | None | | **Step 2b** | Relevance Researcher | GPT-4o-mini | PaperExtraction JSON | `RelevanceReport` | citation_search | | **Step 3** | Review Synthesizer | GPT-4o-mini | Paper + Critique + Research | `ReviewDraft` | None | | **Step 4** | Rubric Evaluator | GPT-4o-mini | Draft + Paper + Critique + Research | `RubricEvaluation` | None | | **Step 5** | Enhancer | GPT-4o-mini | Draft + Rubric + Paper | `FinalReview` | None | ### Pipeline Error Handling - Each agent step is wrapped in `try/except` — a failure in one agent does not crash the pipeline. - If an agent fails, its output defaults to `{"error": "..."}` and downstream agents work with available data. - The Safety Gate blocks the entire pipeline if `is_safe=False` (prompt injection or malicious URLs detected). - PII is always redacted before analysis, even for "safe" documents. --- ## 4. Agents ### Agent 1: Safety Guardian (Programmatic) | Property | Value | |---|---| | **File** | `app.py` — `run_safety_check()` | | **LLM** | None — fully programmatic | | **Purpose** | Gate that blocks unsafe documents before LLM analysis | | **Tools** | pdf_parser, pii_detector, injection_scanner, url_validator | | **Output** | `SafetyReport` | Runs all 4 safety tools as Python functions directly (no CrewAI agent overhead). This is deterministic, fast (<1 second), and avoids LLM hallucinations in safety-critical decisions. **Decision Logic:** - `is_safe = (not injection_detected) AND (no malicious URLs)` - Risk level: `high` if injection or malicious URLs, `medium` if PII found, `low` otherwise - If `is_safe=False` → pipeline is blocked, user sees the Safety Report --- ### Agent 2: Paper Extractor | Property | Value | |---|---| | **File** | `agents/paper_extractor.py` | | **LLM** | GPT-4o (temperature=0.1, seed=42) | | **Role** | Research Paper Data Extractor | | **Tools** | pdf_parser_tool | | **Output** | `PaperExtraction` | | **Max Iterations** | 3 | Extracts structured metadata from the raw paper text: title, authors, abstract, methodology, key findings, contributions, limitations, references count, paper type, and extraction confidence level. Uses GPT-4o (not mini) because extraction requires deep comprehension of the full paper. --- ### Agent 3: Methodology Critic | Property | Value | |---|---| | **File** | `agents/methodology_critic.py` | | **LLM** | GPT-4o-mini (temperature=0.1, seed=42) | | **Role** | Research Methodology Evaluator | | **Tools** | None (pure LLM reasoning) | | **Output** | `MethodologyCritique` | | **Max Iterations** | 5 | Critically evaluates study design, statistical methods, sample sizes, reproducibility, and logical consistency. For theoretical papers, adapts criteria to assess logical rigor and proof completeness. Produces scores for methodology (1-10) and reproducibility (1-10). --- ### Agent 4: Relevance Researcher | Property | Value | |---|---| | **File** | `agents/relevance_researcher.py` | | **LLM** | GPT-4o-mini (temperature=0.1, seed=42) | | **Role** | Related Work Analyst | | **Tools** | citation_search_tool | | **Output** | `RelevanceReport` | | **Max Iterations** | 5 | Searches for real related papers using Semantic Scholar / OpenAlex APIs. Assesses novelty by comparing against existing work. Produces a novelty score (1-10), field context, gaps addressed, and overlaps. **Critical Rule:** Must NOT hallucinate citations. Only uses papers found by the search tool. --- ### Agent 5: Review Synthesizer | Property | Value | |---|---| | **File** | `agents/review_synthesizer.py` | | **LLM** | GPT-4o-mini (temperature=0.1, seed=42, max_tokens=4000) | | **Role** | Peer Review Report Writer | | **Tools** | None (synthesis only) | | **Output** | `ReviewDraft` | | **Max Iterations** | 3 | Combines insights from Paper Extractor, Methodology Critic, and Relevance Researcher into a coherent peer-review draft with summary, strengths, weaknesses, assessments, recommendation (Accept/Revise/Reject), and questions for authors. --- ### Agent 6: Rubric Evaluator | Property | Value | |---|---| | **File** | `agents/rubric_evaluator.py` | | **LLM** | GPT-4o-mini (temperature=0.1, seed=42) | | **Role** | Objective Quality Scorer | | **Tools** | None (evaluation logic only) | | **Output** | `RubricEvaluation` | | **Max Iterations** | 3 | Scores the review draft on **15 strict binary criteria** (0 or 1 each). Pass threshold: >= 11/15. **15 Rubric Criteria:** | # | Category | Criterion | |---|---|---| | 1 | Content Completeness | Title & authors correctly identified | | 2 | Content Completeness | Abstract accurately summarized | | 3 | Content Completeness | Methodology clearly described | | 4 | Content Completeness | At least 3 distinct strengths | | 5 | Content Completeness | At least 3 distinct weaknesses | | 6 | Content Completeness | Limitations acknowledged | | 7 | Content Completeness | Related work present (2+ papers) | | 8 | Analytical Depth | Novelty assessed with justification | | 9 | Analytical Depth | Reproducibility discussed | | 10 | Analytical Depth | Evidence quality evaluated | | 11 | Analytical Depth | Contribution to field stated | | 12 | Review Quality | Recommendation justified with evidence | | 13 | Review Quality | At least 3 actionable questions | | 14 | Review Quality | No hallucinated citations | | 15 | Review Quality | Professional tone and coherent structure | --- ### Agent 7: Enhancer | Property | Value | |---|---| | **File** | `agents/enhancer.py` | | **LLM** | GPT-4o-mini (temperature=0.1, seed=42) | | **Role** | Review Report Enhancer | | **Tools** | None (writing/synthesis only) | | **Output** | `FinalReview` | | **Max Iterations** | 3 | Takes the draft review + rubric feedback and produces a **complete, publication-ready** peer review report (800-1500 words). Fixes all rubric criteria that scored 0 while keeping content that passed. Produces the final executive summary, recommendation, confidence score, and improvement log. --- ## 5. Tools ### Tool 1: PDF Parser (`tools/pdf_parser.py`) | Property | Value | |---|---| | **Library** | pdfplumber | | **Assigned To** | Safety Guardian, Paper Extractor | | **Input** | File path (string) | | **Output** | Extracted text (string) or `"ERROR: ..."` | **Guardrails:** - File must be `.pdf` - File must exist on disk - File size max: 20 MB - Minimum extractable text: 100 chars - Never raises exceptions — returns error strings --- ### Tool 2: PII Detector (`tools/pii_detector.py`) | Property | Value | |---|---| | **Approach** | Regex pattern matching | | **Assigned To** | Safety Guardian | | **Input** | Text to scan | | **Output** | JSON with `findings`, `redacted_text`, `pii_count` | **Patterns Detected:** - Email addresses - Phone numbers (US format) - Social Security Numbers - Credit card numbers All matches are replaced with `[REDACTED_TYPE]` tokens. --- ### Tool 3: Prompt Injection Scanner (`tools/injection_scanner.py`) | Property | Value | |---|---| | **Approach** | Regex pattern matching (9 patterns) | | **Assigned To** | Safety Guardian | | **Input** | Text to scan | | **Output** | JSON with `is_safe`, `suspicious_patterns`, `patterns_checked` | **Patterns Checked:** - "ignore previous instructions" - "disregard above/previous" - "forget everything/all/your instructions" - "new instructions:" - `[INST]` token - `<|im_start|>` token - `<|system|>` token - "override safety" - "jailbreak" **Fail-safe:** If scanning itself fails, the document is treated as **unsafe**. --- ### Tool 4: URL Validator (`tools/url_validator.py`) | Property | Value | |---|---| | **Approach** | Regex extraction + blocklist matching | | **Assigned To** | Safety Guardian | | **Input** | Text to scan | | **Output** | JSON with `total_urls`, `malicious_urls`, `is_safe` | **Suspicious Indicators:** - URL shorteners: bit.ly, tinyurl, t.co, goo.gl - Dangerous protocols: data:, javascript:, file:// - Keywords: malware, phishing Max 50 URLs checked per scan. --- ### Tool 5: Citation Search (`tools/citation_search.py`) | Property | Value | |---|---| | **Primary API** | Semantic Scholar (with retry for HTTP 429) | | **Fallback API** | OpenAlex (free, no rate limits) | | **Assigned To** | Relevance Researcher | | **Input** | Search query (string, max 200 chars) | | **Output** | Formatted text list of papers with title, authors, year, citations, abstract | **Rate Limiting:** - Max 3 API calls per analysis run (tracked globally) - 10-second timeout per API call - Exponential backoff for rate limits (1s, 2s, 4s) **Fallback Chain:** Semantic Scholar -> OpenAlex -> "Search unavailable" message --- ## 6. Pydantic Schemas All schemas inherit from `BaseAgentOutput` which enforces `extra="ignore"` for Gradio compatibility. **File:** `schemas/models.py` ### SafetyReport ``` is_safe: bool (default=False, fail-safe) pii_found: list[str] injection_detected: bool malicious_urls: list[str] sanitized_text: str risk_level: "low" | "medium" | "high" ``` ### PaperExtraction ``` title: str authors: list[str] abstract: str methodology: str key_findings: list[str] contributions: list[str] limitations_stated: list[str] references_count: int paper_type: "empirical" | "theoretical" | "survey" | "system" | "mixed" extraction_confidence: "high" | "medium" | "low" ``` ### MethodologyCritique ``` strengths: list[str] weaknesses: list[str] limitations: list[str] methodology_score: int (1-10) reproducibility_score: int (1-10) suggestions: list[str] bias_risks: list[str] ``` ### RelevanceReport ``` related_papers: list[RelatedPaper] novelty_score: int (1-10) field_context: str gaps_addressed: list[str] overlaps_with_existing: list[str] ``` ### RelatedPaper ``` title: str authors: str year: int citation_count: int relevance: str ``` ### ReviewDraft ``` summary: str strengths_section: str weaknesses_section: str methodology_assessment: str novelty_assessment: str related_work_context: str questions_for_authors: list[str] recommendation: "Accept" | "Revise" | "Reject" confidence: int (1-5) detailed_review: str ``` ### RubricEvaluation ``` scores: dict[str, int] (15 criteria, each 0 or 1) total_score: int (0-15) failed_criteria: list[str] feedback_per_criterion: dict[str, str] passed: bool (True if total_score >= 11) ``` ### FinalReview ``` executive_summary: str paper_metadata: dict strengths: list[str] weaknesses: list[str] methodology_assessment: str novelty_assessment: str related_work_context: str questions_for_authors: list[str] recommendation: "Accept" | "Revise" | "Reject" confidence_score: int (1-5) rubric_scores: dict[str, int] rubric_total: int improvement_log: list[str] ``` --- ## 7. Gradio UI The UI is a single-page Gradio Blocks application with **6 tabs**: | Tab | Content | Component | |---|---|---| | Executive Summary | Recommendation, confidence, rubric score, paper info | Markdown + Download button | | Full Review | Strengths, weaknesses, methodology, novelty, questions | Markdown | | Rubric Scorecard | 15 criteria scores in 3 categories with feedback | Markdown (table) | | Safety Report | PII findings, injection status, URL analysis | Markdown | | Agent Outputs | Raw structured output from each of the 7 agents | Markdown | | Pipeline Logs | Timestamped execution log + JSON summary | Textbox + Code | ### UI Features - **Progress bar** with real-time status updates (e.g., "Agent 3/6: Searching Related Work...") - **Download button** to export the full review as a `.md` file - **File validation** — only accepts `.pdf` files --- ## 8. Safety & Guardrails ### Layered Safety Architecture ```mermaid flowchart TD subgraph LAYER1["Layer 1: Input Validation"] IV1["File type check (.pdf only)"] IV2["File size check (max 20MB)"] IV3["Minimum text check (100+ chars)"] end subgraph LAYER2["Layer 2: Content Safety"] CS1["PII Detection & Redaction"] CS2["Prompt Injection Scanning"] CS3["URL Blocklist Validation"] end subgraph LAYER3["Layer 3: LLM Configuration"] LC1["Low temperature (0.1)"] LC2["Deterministic seed (42)"] LC3["Max iterations per agent"] LC4["Structured output (Pydantic)"] end subgraph LAYER4["Layer 4: Pipeline Resilience"] PR1["Per-agent try/except"] PR2["Graceful degradation"] PR3["API rate limiting (3 calls max)"] PR4["Timeout enforcement (10s)"] end subgraph LAYER5["Layer 5: Observability"] OB1["PipelineLogger — every step logged"] OB2["API key redaction in logs"] OB3["Execution summary with timing"] end LAYER1 --> LAYER2 --> LAYER3 --> LAYER4 --> LAYER5 ``` ### Key Principles - **Fail-safe defaults**: `is_safe=False`, risk defaults to unsafe - **No LLM in the safety gate**: All safety checks are deterministic regex/logic - **PII always redacted**: Even for safe documents, PII is stripped before LLM analysis - **Structured outputs**: Every agent uses Pydantic schemas enforced by CrewAI - **No secrets in logs**: API keys are regex-redacted from all log output --- ## 9. Tech Stack & Dependencies | Package | Version | Purpose | |---|---|---| | `crewai` | >= 0.86.0 | Multi-agent orchestration framework | | `crewai-tools` | >= 0.17.0 | Tool wrapper utilities | | `openai` | >= 1.0.0 | LLM API client (GPT-4o, GPT-4o-mini) | | `pdfplumber` | >= 0.11.0 | PDF text extraction | | `pydantic` | >= 2.0.0 | Structured output validation | | `gradio` | >= 5.0.0 | Web UI framework | | `python-dotenv` | >= 1.0.0 | Environment variable loading | | `requests` | >= 2.31.0 | HTTP client for citation APIs | ### Environment Variables | Variable | Required | Purpose | |---|---|---| | `OPENAI_API_KEY` | Yes | OpenAI API access (GPT-4o required) | --- ## 10. Project Structure ``` Homework5_agentincAI/ |-- app.py # Main application (pipeline + Gradio UI) |-- requirements.txt # Python dependencies |-- README.md # HuggingFace Space metadata |-- .env # Environment variables (API keys) |-- .gitignore | |-- agents/ # CrewAI agent definitions | |-- __init__.py | |-- paper_extractor.py # Agent 2: Structured data extraction | |-- methodology_critic.py # Agent 3: Methodology evaluation | |-- relevance_researcher.py # Agent 4: Related work search | |-- review_synthesizer.py # Agent 5: Draft review writer | |-- rubric_evaluator.py # Agent 6: 15-criteria quality scorer | |-- enhancer.py # Agent 7: Final report polisher | |-- tools/ # CrewAI tool definitions | |-- __init__.py | |-- pdf_parser.py # PDF text extraction | |-- pii_detector.py # PII detection & redaction | |-- injection_scanner.py # Prompt injection detection | |-- url_validator.py # URL blocklist validation | |-- citation_search.py # Semantic Scholar / OpenAlex search | |-- schemas/ # Pydantic output models | |-- __init__.py | |-- models.py # All 8 schema definitions | |-- test_components.py # Component tests |-- tests/ # Test directory ``` --- ## 11. How to Run ### Prerequisites - Python 3.10+ - OpenAI API key with GPT-4o access ### Setup ```bash # 1. Install dependencies pip install -r requirements.txt # 2. Create .env file echo "OPENAI_API_KEY=your-key-here" > .env # 3. Run the application python app.py ``` The Gradio UI launches at `http://0.0.0.0:7860`. ### Usage 1. Open the UI in your browser 2. Upload a research paper PDF (max 20 MB) 3. Click "Analyze Paper" 4. Wait 1-3 minutes for the pipeline to complete 5. Review results across all 6 tabs 6. Download the full report as Markdown --- *Generated for AI Research Paper Analyst — Homework 5, Agentic AI Bootcamp*