A newer version of the Gradio SDK is available:
6.9.0
π¬ AI Research Paper Analyst β Project Walkthrough
Automated Peer-Review System powered by Multi-Agent AI
Upload a research paper (PDF) β receive a publication-ready peer review with methodology critique, novelty assessment, rubric scoring, and an Accept / Revise / Reject recommendation.
1. What Does This System Do?
| Input | Output |
|---|---|
| A single PDF research paper | A structured peer-review report with strengths, weaknesses, rubric scores, and a recommendation |
Key stats:
- 7 specialized AI agents working in a sequential pipeline
- 5 custom tools (PDF parsing, PII redaction, injection scanning, URL validation, citation search)
- 8 Pydantic schemas enforcing structured JSON output from every agent
- 15-point binary rubric for quality assurance
- Gradio web UI with 6 tabs for exploring every aspect of the review
2. System Architecture Flowchart
3. Simplified Pipeline Flow
4. The 7 Agents
| # | Agent | LLM | Role | Key Output |
|---|---|---|---|---|
| 1 | π‘οΈ Safety Guardian | None (programmatic) | Gate β blocks unsafe docs before any LLM sees them | SafetyReport |
| 2 | π Paper Extractor | GPT-4o | Extract title, authors, abstract, methodology, findings | PaperExtraction |
| 3 | π¬ Methodology Critic | GPT-4o-mini | Evaluate study design, stats, reproducibility | MethodologyCritique |
| 4 | π Relevance Researcher | GPT-4o-mini | Search Semantic Scholar / OpenAlex for related work | RelevanceReport |
| 5 | βοΈ Review Synthesizer | GPT-4o-mini | Combine all insights into a peer-review draft | ReviewDraft |
| 6 | π Rubric Evaluator | GPT-4o-mini | Score the draft on 15 binary criteria (pass β₯ 11/15) | RubricEvaluation |
| 7 | β¨ Enhancer | GPT-4o-mini | Fix rubric failures, produce publication-ready report | FinalReview |
5. The 5 Tools
| # | Tool | File | Used By | What It Does |
|---|---|---|---|---|
| 1 | π PDF Parser | tools/pdf_parser.py |
Safety Guardian, Paper Extractor | Extracts text from PDF using pdfplumber. Validates file type, existence, and size (β€ 20 MB). |
| 2 | π PII Detector | tools/pii_detector.py |
Safety Guardian | Regex-based scan for emails, phone numbers, SSNs, credit cards. Replaces matches with [REDACTED_TYPE]. |
| 3 | π« Injection Scanner | tools/injection_scanner.py |
Safety Guardian | Checks text against 9 prompt-injection patterns (e.g. "ignore previous instructions", [INST]). Fail-safe: defaults to unsafe if scanning crashes. |
| 4 | π URL Validator | tools/url_validator.py |
Safety Guardian | Extracts URLs via regex, checks against blocklist (bit.ly, tinyurl, data:, javascript:). Max 50 URLs per scan. |
| 5 | π Citation Search | tools/citation_search.py |
Relevance Researcher | Searches Semantic Scholar (with retry + backoff for rate limits). Falls back to OpenAlex if unavailable. Max 3 API calls per run. |
ToolβAgent Assignment Map
6. Pydantic Schemas (Structured Output)
Every agent is forced to output validated JSON through Pydantic schemas. If an agent's output doesn't match the schema, CrewAI automatically retries with a correction prompt.
| Schema | Key Fields |
|---|---|
SafetyReport |
is_safe, pii_found, injection_detected, malicious_urls, risk_level |
PaperExtraction |
title, authors, abstract, methodology, key_findings, paper_type, extraction_confidence |
MethodologyCritique |
strengths, weaknesses, methodology_score (1-10), reproducibility_score (1-10), bias_risks |
RelevanceReport |
related_papers[], novelty_score (1-10), field_context, gaps_addressed |
ReviewDraft |
summary, strengths_section, weaknesses_section, recommendation (Accept/Revise/Reject) |
RubricEvaluation |
scores{} (15 binary criteria), total_score (0β15), passed (β₯ 11) |
FinalReview |
executive_summary, strengths, weaknesses, recommendation, confidence_score, improvement_log |
7. Safety & Guardrails β 5 Layers
Key principle: The Safety Guardian uses zero LLM calls β all safety decisions are deterministic regex/logic. This prevents prompt injection attacks from manipulating the safety gate itself.
8. Rubric β 15 Binary Criteria
The Rubric Evaluator scores the review on 15 strict pass/fail criteria (0 or 1 each). A review passes with β₯ 11/15.
| # | Category | Criterion |
|---|---|---|
| 1 | π Content | Title & authors correctly identified |
| 2 | π Content | Abstract accurately summarized |
| 3 | π Content | Methodology clearly described |
| 4 | π Content | At least 3 distinct strengths |
| 5 | π Content | At least 3 distinct weaknesses |
| 6 | π Content | Limitations acknowledged |
| 7 | π Content | Related work present (2+ papers) |
| 8 | π¬ Depth | Novelty assessed with justification |
| 9 | π¬ Depth | Reproducibility discussed |
| 10 | π¬ Depth | Evidence quality evaluated |
| 11 | π¬ Depth | Contribution to field stated |
| 12 | π Quality | Recommendation justified with evidence |
| 13 | π Quality | At least 3 actionable questions |
| 14 | π Quality | No hallucinated citations |
| 15 | π Quality | Professional tone and coherent structure |
9. Gradio UI β 6 Tabs
| Tab | What It Shows |
|---|---|
| π Executive Summary | Recommendation (Accept/Revise/Reject), confidence, rubric score, paper info + download button |
| π Full Review | Strengths, weaknesses, methodology & novelty assessments, author questions |
| π Rubric Scorecard | All 15 criteria with β /β scores and per-criterion feedback |
| π‘οΈ Safety Report | PII findings, injection scan result, URL analysis |
| π Agent Outputs | Raw structured JSON output from each of the 7 agents |
| βοΈ Pipeline Logs | Timestamped execution log + JSON run summary |
10. Tech Stack
| Package | Purpose |
|---|---|
| CrewAI β₯ 0.86.0 | Multi-agent orchestration framework |
| OpenAI β₯ 1.0.0 | LLM API β GPT-4o + GPT-4o-mini |
| Gradio β₯ 5.0.0 | Web UI |
| pdfplumber β₯ 0.11.0 | PDF text extraction |
| Pydantic β₯ 2.0.0 | Structured output validation |
| python-dotenv β₯ 1.0.0 | .env file loading |
| requests β₯ 2.31.0 | HTTP calls to Semantic Scholar / OpenAlex |
11. Project Structure
Homework5_agentincAI/
βββ app.py # Main pipeline + Gradio UI (1045 lines)
βββ requirements.txt # Dependencies
βββ .env # OPENAI_API_KEY
β
βββ agents/ # CrewAI agent definitions
β βββ paper_extractor.py # Step 1 β GPT-4o
β βββ methodology_critic.py # Step 2a β GPT-4o-mini
β βββ relevance_researcher.py # Step 2b β GPT-4o-mini
β βββ review_synthesizer.py # Step 3 β GPT-4o-mini
β βββ rubric_evaluator.py # Step 4 β GPT-4o-mini
β βββ enhancer.py # Step 5 β GPT-4o-mini
β
βββ tools/ # Custom tools
β βββ pdf_parser.py # PDF β text
β βββ pii_detector.py # PII scan & redact
β βββ injection_scanner.py # Prompt injection detection
β βββ url_validator.py # URL blocklist check
β βββ citation_search.py # Semantic Scholar / OpenAlex
β
βββ schemas/
βββ models.py # All 8 Pydantic schemas
12. How to Run
# 1. Install dependencies
pip install -r requirements.txt
# 2. Set your OpenAI API key in .env
echo "OPENAI_API_KEY=your-key-here" > .env
# 3. Launch the app
python app.py
Open http://localhost:7860 β Upload a PDF β Click "Analyze Paper" β Wait 1β3 minutes β Review across all 6 tabs.
AI Research Paper Analyst β Homework 5, Agentic AI Bootcamp



