AI-Research-Paper-Analyst / PROJECT_DOCUMENTATION.md
Saleh
Clean deployment to HuggingFace Space
2447eba
# AI Research Paper Analyst β€” Complete Project Documentation
## Table of Contents
1. [Project Overview](#1-project-overview)
2. [System Architecture Flowchart](#2-system-architecture-flowchart)
3. [Pipeline Flow](#3-pipeline-flow)
4. [Agents](#4-agents)
5. [Tools](#5-tools)
6. [Pydantic Schemas](#6-pydantic-schemas)
7. [Gradio UI](#7-gradio-ui)
8. [Safety & Guardrails](#8-safety--guardrails)
9. [Tech Stack & Dependencies](#9-tech-stack--dependencies)
10. [Project Structure](#10-project-structure)
11. [How to Run](#11-how-to-run)
---
## 1. Project Overview
**AI Research Paper Analyst** is an automated peer-review system powered by a multi-agent AI pipeline. A user uploads a research paper (PDF), and the system produces a comprehensive, publication-ready peer review β€” including methodology critique, novelty assessment, rubric scoring, and a final Accept/Revise/Reject recommendation.
| Property | Value |
|---|---|
| **Framework** | CrewAI (multi-agent orchestration) |
| **LLM Backend** | OpenAI GPT-4o (extraction) + GPT-4o-mini (all other agents) |
| **Frontend** | Gradio 5.x |
| **Safety** | Programmatic (regex/logic-based) β€” no LLM in the safety gate |
| **Output Format** | Structured JSON (Pydantic) rendered as Markdown |
---
## 2. System Architecture Flowchart
```mermaid
flowchart TD
A["User Uploads PDF via Gradio UI"] --> B["File Validation"]
B -->|Invalid| B_ERR["Return Error to UI"]
B -->|Valid .pdf| C["GATE 1: Safety Guardian (Programmatic)"]
subgraph SAFETY_GATE["Safety Gate β€” No LLM"]
C --> C1["PDF Parser Tool β€” Extract raw text"]
C1 --> C2["PII Detector Tool β€” Scan & redact PII"]
C2 --> C3["Injection Scanner Tool β€” Check for prompt injections"]
C3 --> C4["URL Validator Tool β€” Flag malicious URLs"]
C4 --> C5{"is_safe?"}
end
C5 -->|UNSAFE| BLOCK["Block Document β€” Show Safety Report"]
C5 -->|SAFE| D["Sanitized Text passed to Analysis Pipeline"]
subgraph ANALYSIS_PIPELINE["Analysis Pipeline β€” CrewAI Sequential"]
D --> E["STEP 1: Paper Extractor Agent (GPT-4o)"]
E -->|PaperExtraction JSON| F["STEP 2a: Methodology Critic Agent (GPT-4o-mini)"]
E -->|PaperExtraction JSON| G["STEP 2b: Relevance Researcher Agent (GPT-4o-mini)"]
F -->|MethodologyCritique JSON| H["STEP 3: Review Synthesizer Agent (GPT-4o-mini)"]
G -->|RelevanceReport JSON| H
E -->|PaperExtraction JSON| H
H -->|ReviewDraft JSON| I["STEP 4: Rubric Evaluator Agent (GPT-4o-mini)"]
I -->|RubricEvaluation JSON| J["STEP 5: Enhancer Agent (GPT-4o-mini)"]
H -->|ReviewDraft JSON| J
E -->|PaperExtraction JSON| J
end
J -->|FinalReview JSON| K["Output Formatting"]
subgraph OUTPUT["Gradio UI β€” 6 Tabs"]
K --> K1["Executive Summary Tab"]
K --> K2["Full Review Tab"]
K --> K3["Rubric Scorecard Tab"]
K --> K4["Safety Report Tab"]
K --> K5["Agent Outputs Tab"]
K --> K6["Pipeline Logs Tab"]
end
K2 --> DL["Download Full Report (.md)"]
```
### Simplified Agent Pipeline Flow
```mermaid
flowchart LR
PDF["PDF Upload"] --> SG["Safety\nGuardian"]
SG --> PE["Paper\nExtractor"]
PE --> MC["Methodology\nCritic"]
PE --> RR["Relevance\nResearcher"]
MC --> RS["Review\nSynthesizer"]
RR --> RS
RS --> RE["Rubric\nEvaluator"]
RE --> EN["Enhancer"]
EN --> OUT["Final\nReport"]
style SG fill:#ff6b6b,stroke:#c0392b,color:#fff
style PE fill:#74b9ff,stroke:#2980b9,color:#fff
style MC fill:#a29bfe,stroke:#6c5ce7,color:#fff
style RR fill:#a29bfe,stroke:#6c5ce7,color:#fff
style RS fill:#55efc4,stroke:#00b894,color:#fff
style RE fill:#ffeaa7,stroke:#fdcb6e,color:#333
style EN fill:#fd79a8,stroke:#e84393,color:#fff
```
### Data Flow Diagram
```mermaid
flowchart TD
subgraph TOOLS["Tools Layer"]
T1["pdf_parser_tool"]
T2["pii_detector_tool"]
T3["injection_scanner_tool"]
T4["url_validator_tool"]
T5["citation_search_tool"]
end
subgraph AGENTS["Agent Layer"]
A1["Safety Guardian\n(Programmatic)"]
A2["Paper Extractor\n(GPT-4o)"]
A3["Methodology Critic\n(GPT-4o-mini)"]
A4["Relevance Researcher\n(GPT-4o-mini)"]
A5["Review Synthesizer\n(GPT-4o-mini)"]
A6["Rubric Evaluator\n(GPT-4o-mini)"]
A7["Enhancer\n(GPT-4o-mini)"]
end
subgraph SCHEMAS["Schema Layer (Pydantic)"]
S1["SafetyReport"]
S2["PaperExtraction"]
S3["MethodologyCritique"]
S4["RelevanceReport"]
S5["ReviewDraft"]
S6["RubricEvaluation"]
S7["FinalReview"]
end
A1 -.->|uses| T1 & T2 & T3 & T4
A2 -.->|uses| T1
A4 -.->|uses| T5
A1 -->|outputs| S1
A2 -->|outputs| S2
A3 -->|outputs| S3
A4 -->|outputs| S4
A5 -->|outputs| S5
A6 -->|outputs| S6
A7 -->|outputs| S7
```
---
## 3. Pipeline Flow
The system runs as a **sequential pipeline** with one safety gate and six analysis steps:
| Stage | Agent | LLM | Input | Output Schema | Tools Used |
|---|---|---|---|---|---|
| **Gate 1** | Safety Guardian | None (programmatic) | Raw PDF file | `SafetyReport` | pdf_parser, pii_detector, injection_scanner, url_validator |
| **Step 1** | Paper Extractor | GPT-4o | Sanitized text | `PaperExtraction` | pdf_parser |
| **Step 2a** | Methodology Critic | GPT-4o-mini | PaperExtraction JSON | `MethodologyCritique` | None |
| **Step 2b** | Relevance Researcher | GPT-4o-mini | PaperExtraction JSON | `RelevanceReport` | citation_search |
| **Step 3** | Review Synthesizer | GPT-4o-mini | Paper + Critique + Research | `ReviewDraft` | None |
| **Step 4** | Rubric Evaluator | GPT-4o-mini | Draft + Paper + Critique + Research | `RubricEvaluation` | None |
| **Step 5** | Enhancer | GPT-4o-mini | Draft + Rubric + Paper | `FinalReview` | None |
### Pipeline Error Handling
- Each agent step is wrapped in `try/except` β€” a failure in one agent does not crash the pipeline.
- If an agent fails, its output defaults to `{"error": "..."}` and downstream agents work with available data.
- The Safety Gate blocks the entire pipeline if `is_safe=False` (prompt injection or malicious URLs detected).
- PII is always redacted before analysis, even for "safe" documents.
---
## 4. Agents
### Agent 1: Safety Guardian (Programmatic)
| Property | Value |
|---|---|
| **File** | `app.py` β€” `run_safety_check()` |
| **LLM** | None β€” fully programmatic |
| **Purpose** | Gate that blocks unsafe documents before LLM analysis |
| **Tools** | pdf_parser, pii_detector, injection_scanner, url_validator |
| **Output** | `SafetyReport` |
Runs all 4 safety tools as Python functions directly (no CrewAI agent overhead). This is deterministic, fast (<1 second), and avoids LLM hallucinations in safety-critical decisions.
**Decision Logic:**
- `is_safe = (not injection_detected) AND (no malicious URLs)`
- Risk level: `high` if injection or malicious URLs, `medium` if PII found, `low` otherwise
- If `is_safe=False` β†’ pipeline is blocked, user sees the Safety Report
---
### Agent 2: Paper Extractor
| Property | Value |
|---|---|
| **File** | `agents/paper_extractor.py` |
| **LLM** | GPT-4o (temperature=0.1, seed=42) |
| **Role** | Research Paper Data Extractor |
| **Tools** | pdf_parser_tool |
| **Output** | `PaperExtraction` |
| **Max Iterations** | 3 |
Extracts structured metadata from the raw paper text: title, authors, abstract, methodology, key findings, contributions, limitations, references count, paper type, and extraction confidence level.
Uses GPT-4o (not mini) because extraction requires deep comprehension of the full paper.
---
### Agent 3: Methodology Critic
| Property | Value |
|---|---|
| **File** | `agents/methodology_critic.py` |
| **LLM** | GPT-4o-mini (temperature=0.1, seed=42) |
| **Role** | Research Methodology Evaluator |
| **Tools** | None (pure LLM reasoning) |
| **Output** | `MethodologyCritique` |
| **Max Iterations** | 5 |
Critically evaluates study design, statistical methods, sample sizes, reproducibility, and logical consistency. For theoretical papers, adapts criteria to assess logical rigor and proof completeness. Produces scores for methodology (1-10) and reproducibility (1-10).
---
### Agent 4: Relevance Researcher
| Property | Value |
|---|---|
| **File** | `agents/relevance_researcher.py` |
| **LLM** | GPT-4o-mini (temperature=0.1, seed=42) |
| **Role** | Related Work Analyst |
| **Tools** | citation_search_tool |
| **Output** | `RelevanceReport` |
| **Max Iterations** | 5 |
Searches for real related papers using Semantic Scholar / OpenAlex APIs. Assesses novelty by comparing against existing work. Produces a novelty score (1-10), field context, gaps addressed, and overlaps.
**Critical Rule:** Must NOT hallucinate citations. Only uses papers found by the search tool.
---
### Agent 5: Review Synthesizer
| Property | Value |
|---|---|
| **File** | `agents/review_synthesizer.py` |
| **LLM** | GPT-4o-mini (temperature=0.1, seed=42, max_tokens=4000) |
| **Role** | Peer Review Report Writer |
| **Tools** | None (synthesis only) |
| **Output** | `ReviewDraft` |
| **Max Iterations** | 3 |
Combines insights from Paper Extractor, Methodology Critic, and Relevance Researcher into a coherent peer-review draft with summary, strengths, weaknesses, assessments, recommendation (Accept/Revise/Reject), and questions for authors.
---
### Agent 6: Rubric Evaluator
| Property | Value |
|---|---|
| **File** | `agents/rubric_evaluator.py` |
| **LLM** | GPT-4o-mini (temperature=0.1, seed=42) |
| **Role** | Objective Quality Scorer |
| **Tools** | None (evaluation logic only) |
| **Output** | `RubricEvaluation` |
| **Max Iterations** | 3 |
Scores the review draft on **15 strict binary criteria** (0 or 1 each). Pass threshold: >= 11/15.
**15 Rubric Criteria:**
| # | Category | Criterion |
|---|---|---|
| 1 | Content Completeness | Title & authors correctly identified |
| 2 | Content Completeness | Abstract accurately summarized |
| 3 | Content Completeness | Methodology clearly described |
| 4 | Content Completeness | At least 3 distinct strengths |
| 5 | Content Completeness | At least 3 distinct weaknesses |
| 6 | Content Completeness | Limitations acknowledged |
| 7 | Content Completeness | Related work present (2+ papers) |
| 8 | Analytical Depth | Novelty assessed with justification |
| 9 | Analytical Depth | Reproducibility discussed |
| 10 | Analytical Depth | Evidence quality evaluated |
| 11 | Analytical Depth | Contribution to field stated |
| 12 | Review Quality | Recommendation justified with evidence |
| 13 | Review Quality | At least 3 actionable questions |
| 14 | Review Quality | No hallucinated citations |
| 15 | Review Quality | Professional tone and coherent structure |
---
### Agent 7: Enhancer
| Property | Value |
|---|---|
| **File** | `agents/enhancer.py` |
| **LLM** | GPT-4o-mini (temperature=0.1, seed=42) |
| **Role** | Review Report Enhancer |
| **Tools** | None (writing/synthesis only) |
| **Output** | `FinalReview` |
| **Max Iterations** | 3 |
Takes the draft review + rubric feedback and produces a **complete, publication-ready** peer review report (800-1500 words). Fixes all rubric criteria that scored 0 while keeping content that passed. Produces the final executive summary, recommendation, confidence score, and improvement log.
---
## 5. Tools
### Tool 1: PDF Parser (`tools/pdf_parser.py`)
| Property | Value |
|---|---|
| **Library** | pdfplumber |
| **Assigned To** | Safety Guardian, Paper Extractor |
| **Input** | File path (string) |
| **Output** | Extracted text (string) or `"ERROR: ..."` |
**Guardrails:**
- File must be `.pdf`
- File must exist on disk
- File size max: 20 MB
- Minimum extractable text: 100 chars
- Never raises exceptions β€” returns error strings
---
### Tool 2: PII Detector (`tools/pii_detector.py`)
| Property | Value |
|---|---|
| **Approach** | Regex pattern matching |
| **Assigned To** | Safety Guardian |
| **Input** | Text to scan |
| **Output** | JSON with `findings`, `redacted_text`, `pii_count` |
**Patterns Detected:**
- Email addresses
- Phone numbers (US format)
- Social Security Numbers
- Credit card numbers
All matches are replaced with `[REDACTED_TYPE]` tokens.
---
### Tool 3: Prompt Injection Scanner (`tools/injection_scanner.py`)
| Property | Value |
|---|---|
| **Approach** | Regex pattern matching (9 patterns) |
| **Assigned To** | Safety Guardian |
| **Input** | Text to scan |
| **Output** | JSON with `is_safe`, `suspicious_patterns`, `patterns_checked` |
**Patterns Checked:**
- "ignore previous instructions"
- "disregard above/previous"
- "forget everything/all/your instructions"
- "new instructions:"
- `[INST]` token
- `<|im_start|>` token
- `<|system|>` token
- "override safety"
- "jailbreak"
**Fail-safe:** If scanning itself fails, the document is treated as **unsafe**.
---
### Tool 4: URL Validator (`tools/url_validator.py`)
| Property | Value |
|---|---|
| **Approach** | Regex extraction + blocklist matching |
| **Assigned To** | Safety Guardian |
| **Input** | Text to scan |
| **Output** | JSON with `total_urls`, `malicious_urls`, `is_safe` |
**Suspicious Indicators:**
- URL shorteners: bit.ly, tinyurl, t.co, goo.gl
- Dangerous protocols: data:, javascript:, file://
- Keywords: malware, phishing
Max 50 URLs checked per scan.
---
### Tool 5: Citation Search (`tools/citation_search.py`)
| Property | Value |
|---|---|
| **Primary API** | Semantic Scholar (with retry for HTTP 429) |
| **Fallback API** | OpenAlex (free, no rate limits) |
| **Assigned To** | Relevance Researcher |
| **Input** | Search query (string, max 200 chars) |
| **Output** | Formatted text list of papers with title, authors, year, citations, abstract |
**Rate Limiting:**
- Max 3 API calls per analysis run (tracked globally)
- 10-second timeout per API call
- Exponential backoff for rate limits (1s, 2s, 4s)
**Fallback Chain:** Semantic Scholar -> OpenAlex -> "Search unavailable" message
---
## 6. Pydantic Schemas
All schemas inherit from `BaseAgentOutput` which enforces `extra="ignore"` for Gradio compatibility.
**File:** `schemas/models.py`
### SafetyReport
```
is_safe: bool (default=False, fail-safe)
pii_found: list[str]
injection_detected: bool
malicious_urls: list[str]
sanitized_text: str
risk_level: "low" | "medium" | "high"
```
### PaperExtraction
```
title: str
authors: list[str]
abstract: str
methodology: str
key_findings: list[str]
contributions: list[str]
limitations_stated: list[str]
references_count: int
paper_type: "empirical" | "theoretical" | "survey" | "system" | "mixed"
extraction_confidence: "high" | "medium" | "low"
```
### MethodologyCritique
```
strengths: list[str]
weaknesses: list[str]
limitations: list[str]
methodology_score: int (1-10)
reproducibility_score: int (1-10)
suggestions: list[str]
bias_risks: list[str]
```
### RelevanceReport
```
related_papers: list[RelatedPaper]
novelty_score: int (1-10)
field_context: str
gaps_addressed: list[str]
overlaps_with_existing: list[str]
```
### RelatedPaper
```
title: str
authors: str
year: int
citation_count: int
relevance: str
```
### ReviewDraft
```
summary: str
strengths_section: str
weaknesses_section: str
methodology_assessment: str
novelty_assessment: str
related_work_context: str
questions_for_authors: list[str]
recommendation: "Accept" | "Revise" | "Reject"
confidence: int (1-5)
detailed_review: str
```
### RubricEvaluation
```
scores: dict[str, int] (15 criteria, each 0 or 1)
total_score: int (0-15)
failed_criteria: list[str]
feedback_per_criterion: dict[str, str]
passed: bool (True if total_score >= 11)
```
### FinalReview
```
executive_summary: str
paper_metadata: dict
strengths: list[str]
weaknesses: list[str]
methodology_assessment: str
novelty_assessment: str
related_work_context: str
questions_for_authors: list[str]
recommendation: "Accept" | "Revise" | "Reject"
confidence_score: int (1-5)
rubric_scores: dict[str, int]
rubric_total: int
improvement_log: list[str]
```
---
## 7. Gradio UI
The UI is a single-page Gradio Blocks application with **6 tabs**:
| Tab | Content | Component |
|---|---|---|
| Executive Summary | Recommendation, confidence, rubric score, paper info | Markdown + Download button |
| Full Review | Strengths, weaknesses, methodology, novelty, questions | Markdown |
| Rubric Scorecard | 15 criteria scores in 3 categories with feedback | Markdown (table) |
| Safety Report | PII findings, injection status, URL analysis | Markdown |
| Agent Outputs | Raw structured output from each of the 7 agents | Markdown |
| Pipeline Logs | Timestamped execution log + JSON summary | Textbox + Code |
### UI Features
- **Progress bar** with real-time status updates (e.g., "Agent 3/6: Searching Related Work...")
- **Download button** to export the full review as a `.md` file
- **File validation** β€” only accepts `.pdf` files
---
## 8. Safety & Guardrails
### Layered Safety Architecture
```mermaid
flowchart TD
subgraph LAYER1["Layer 1: Input Validation"]
IV1["File type check (.pdf only)"]
IV2["File size check (max 20MB)"]
IV3["Minimum text check (100+ chars)"]
end
subgraph LAYER2["Layer 2: Content Safety"]
CS1["PII Detection & Redaction"]
CS2["Prompt Injection Scanning"]
CS3["URL Blocklist Validation"]
end
subgraph LAYER3["Layer 3: LLM Configuration"]
LC1["Low temperature (0.1)"]
LC2["Deterministic seed (42)"]
LC3["Max iterations per agent"]
LC4["Structured output (Pydantic)"]
end
subgraph LAYER4["Layer 4: Pipeline Resilience"]
PR1["Per-agent try/except"]
PR2["Graceful degradation"]
PR3["API rate limiting (3 calls max)"]
PR4["Timeout enforcement (10s)"]
end
subgraph LAYER5["Layer 5: Observability"]
OB1["PipelineLogger β€” every step logged"]
OB2["API key redaction in logs"]
OB3["Execution summary with timing"]
end
LAYER1 --> LAYER2 --> LAYER3 --> LAYER4 --> LAYER5
```
### Key Principles
- **Fail-safe defaults**: `is_safe=False`, risk defaults to unsafe
- **No LLM in the safety gate**: All safety checks are deterministic regex/logic
- **PII always redacted**: Even for safe documents, PII is stripped before LLM analysis
- **Structured outputs**: Every agent uses Pydantic schemas enforced by CrewAI
- **No secrets in logs**: API keys are regex-redacted from all log output
---
## 9. Tech Stack & Dependencies
| Package | Version | Purpose |
|---|---|---|
| `crewai` | >= 0.86.0 | Multi-agent orchestration framework |
| `crewai-tools` | >= 0.17.0 | Tool wrapper utilities |
| `openai` | >= 1.0.0 | LLM API client (GPT-4o, GPT-4o-mini) |
| `pdfplumber` | >= 0.11.0 | PDF text extraction |
| `pydantic` | >= 2.0.0 | Structured output validation |
| `gradio` | >= 5.0.0 | Web UI framework |
| `python-dotenv` | >= 1.0.0 | Environment variable loading |
| `requests` | >= 2.31.0 | HTTP client for citation APIs |
### Environment Variables
| Variable | Required | Purpose |
|---|---|---|
| `OPENAI_API_KEY` | Yes | OpenAI API access (GPT-4o required) |
---
## 10. Project Structure
```
Homework5_agentincAI/
|-- app.py # Main application (pipeline + Gradio UI)
|-- requirements.txt # Python dependencies
|-- README.md # HuggingFace Space metadata
|-- .env # Environment variables (API keys)
|-- .gitignore
|
|-- agents/ # CrewAI agent definitions
| |-- __init__.py
| |-- paper_extractor.py # Agent 2: Structured data extraction
| |-- methodology_critic.py # Agent 3: Methodology evaluation
| |-- relevance_researcher.py # Agent 4: Related work search
| |-- review_synthesizer.py # Agent 5: Draft review writer
| |-- rubric_evaluator.py # Agent 6: 15-criteria quality scorer
| |-- enhancer.py # Agent 7: Final report polisher
|
|-- tools/ # CrewAI tool definitions
| |-- __init__.py
| |-- pdf_parser.py # PDF text extraction
| |-- pii_detector.py # PII detection & redaction
| |-- injection_scanner.py # Prompt injection detection
| |-- url_validator.py # URL blocklist validation
| |-- citation_search.py # Semantic Scholar / OpenAlex search
|
|-- schemas/ # Pydantic output models
| |-- __init__.py
| |-- models.py # All 8 schema definitions
|
|-- test_components.py # Component tests
|-- tests/ # Test directory
```
---
## 11. How to Run
### Prerequisites
- Python 3.10+
- OpenAI API key with GPT-4o access
### Setup
```bash
# 1. Install dependencies
pip install -r requirements.txt
# 2. Create .env file
echo "OPENAI_API_KEY=your-key-here" > .env
# 3. Run the application
python app.py
```
The Gradio UI launches at `http://0.0.0.0:7860`.
### Usage
1. Open the UI in your browser
2. Upload a research paper PDF (max 20 MB)
3. Click "Analyze Paper"
4. Wait 1-3 minutes for the pipeline to complete
5. Review results across all 6 tabs
6. Download the full report as Markdown
---
*Generated for AI Research Paper Analyst β€” Homework 5, Agentic AI Bootcamp*