AI-Research-Paper-Analyst / walkthrough.md
Saleh
Clean deployment to HuggingFace Space
2447eba
# πŸ”¬ AI Research Paper Analyst β€” Project Walkthrough
> **Automated Peer-Review System powered by Multi-Agent AI**
>
> Upload a research paper (PDF) β†’ receive a publication-ready peer review with methodology critique, novelty assessment, rubric scoring, and an Accept / Revise / Reject recommendation.
---
## 1. What Does This System Do?
| Input | Output |
|---|---|
| A single **PDF** research paper | A structured **peer-review report** with strengths, weaknesses, rubric scores, and a recommendation |
**Key stats:**
- **7 specialized AI agents** working in a sequential pipeline
- **5 custom tools** (PDF parsing, PII redaction, injection scanning, URL validation, citation search)
- **8 Pydantic schemas** enforcing structured JSON output from every agent
- **15-point binary rubric** for quality assurance
- **Gradio web UI** with 6 tabs for exploring every aspect of the review
---
## 2. System Architecture Flowchart
![System Architecture](docs/images/system_architecture.png)
---
## 3. Simplified Pipeline Flow
![Pipeline Flow](docs/images/pipeline_flow.png)
---
## 4. The 7 Agents
| # | Agent | LLM | Role | Key Output |
|---|---|---|---|---|
| 1 | πŸ›‘οΈ **Safety Guardian** | None (programmatic) | Gate β€” blocks unsafe docs before any LLM sees them | `SafetyReport` |
| 2 | πŸ“„ **Paper Extractor** | GPT-4o | Extract title, authors, abstract, methodology, findings | `PaperExtraction` |
| 3 | πŸ”¬ **Methodology Critic** | GPT-4o-mini | Evaluate study design, stats, reproducibility | `MethodologyCritique` |
| 4 | πŸ” **Relevance Researcher** | GPT-4o-mini | Search Semantic Scholar / OpenAlex for related work | `RelevanceReport` |
| 5 | ✍️ **Review Synthesizer** | GPT-4o-mini | Combine all insights into a peer-review draft | `ReviewDraft` |
| 6 | πŸ“ **Rubric Evaluator** | GPT-4o-mini | Score the draft on 15 binary criteria (pass β‰₯ 11/15) | `RubricEvaluation` |
| 7 | ✨ **Enhancer** | GPT-4o-mini | Fix rubric failures, produce publication-ready report | `FinalReview` |
---
## 5. The 5 Tools
| # | Tool | File | Used By | What It Does |
|---|---|---|---|---|
| 1 | πŸ“‘ **PDF Parser** | `tools/pdf_parser.py` | Safety Guardian, Paper Extractor | Extracts text from PDF using `pdfplumber`. Validates file type, existence, and size (≀ 20 MB). |
| 2 | πŸ”’ **PII Detector** | `tools/pii_detector.py` | Safety Guardian | Regex-based scan for emails, phone numbers, SSNs, credit cards. Replaces matches with `[REDACTED_TYPE]`. |
| 3 | 🚫 **Injection Scanner** | `tools/injection_scanner.py` | Safety Guardian | Checks text against 9 prompt-injection patterns (e.g. "ignore previous instructions", `[INST]`). Fail-safe: defaults to **unsafe** if scanning crashes. |
| 4 | 🌐 **URL Validator** | `tools/url_validator.py` | Safety Guardian | Extracts URLs via regex, checks against blocklist (bit.ly, tinyurl, `data:`, `javascript:`). Max 50 URLs per scan. |
| 5 | πŸ”Ž **Citation Search** | `tools/citation_search.py` | Relevance Researcher | Searches **Semantic Scholar** (with retry + backoff for rate limits). Falls back to **OpenAlex** if unavailable. Max 3 API calls per run. |
### Tool–Agent Assignment Map
![Tool-Agent Assignment](docs/images/tool_agent_map.png)
---
## 6. Pydantic Schemas (Structured Output)
Every agent is forced to output **validated JSON** through Pydantic schemas. If an agent's output doesn't match the schema, CrewAI automatically retries with a correction prompt.
| Schema | Key Fields |
|---|---|
| `SafetyReport` | `is_safe`, `pii_found`, `injection_detected`, `malicious_urls`, `risk_level` |
| `PaperExtraction` | `title`, `authors`, `abstract`, `methodology`, `key_findings`, `paper_type`, `extraction_confidence` |
| `MethodologyCritique` | `strengths`, `weaknesses`, `methodology_score` (1-10), `reproducibility_score` (1-10), `bias_risks` |
| `RelevanceReport` | `related_papers[]`, `novelty_score` (1-10), `field_context`, `gaps_addressed` |
| `ReviewDraft` | `summary`, `strengths_section`, `weaknesses_section`, `recommendation` (Accept/Revise/Reject) |
| `RubricEvaluation` | `scores{}` (15 binary criteria), `total_score` (0–15), `passed` (β‰₯ 11) |
| `FinalReview` | `executive_summary`, `strengths`, `weaknesses`, `recommendation`, `confidence_score`, `improvement_log` |
---
## 7. Safety & Guardrails β€” 5 Layers
![5-Layer Safety Architecture](docs/images/safety_layers.png)
**Key principle:** The Safety Guardian uses **zero LLM calls** β€” all safety decisions are deterministic regex/logic. This prevents prompt injection attacks from manipulating the safety gate itself.
---
## 8. Rubric β€” 15 Binary Criteria
The Rubric Evaluator scores the review on **15 strict pass/fail criteria** (0 or 1 each). A review **passes** with β‰₯ 11/15.
| # | Category | Criterion |
|---|---|---|
| 1 | πŸ“‹ Content | Title & authors correctly identified |
| 2 | πŸ“‹ Content | Abstract accurately summarized |
| 3 | πŸ“‹ Content | Methodology clearly described |
| 4 | πŸ“‹ Content | At least 3 distinct strengths |
| 5 | πŸ“‹ Content | At least 3 distinct weaknesses |
| 6 | πŸ“‹ Content | Limitations acknowledged |
| 7 | πŸ“‹ Content | Related work present (2+ papers) |
| 8 | πŸ”¬ Depth | Novelty assessed with justification |
| 9 | πŸ”¬ Depth | Reproducibility discussed |
| 10 | πŸ”¬ Depth | Evidence quality evaluated |
| 11 | πŸ”¬ Depth | Contribution to field stated |
| 12 | πŸ“ Quality | Recommendation justified with evidence |
| 13 | πŸ“ Quality | At least 3 actionable questions |
| 14 | πŸ“ Quality | No hallucinated citations |
| 15 | πŸ“ Quality | Professional tone and coherent structure |
---
## 9. Gradio UI β€” 6 Tabs
| Tab | What It Shows |
|---|---|
| πŸ“‹ **Executive Summary** | Recommendation (Accept/Revise/Reject), confidence, rubric score, paper info + download button |
| πŸ“ **Full Review** | Strengths, weaknesses, methodology & novelty assessments, author questions |
| πŸ“Š **Rubric Scorecard** | All 15 criteria with βœ…/❌ scores and per-criterion feedback |
| πŸ›‘οΈ **Safety Report** | PII findings, injection scan result, URL analysis |
| πŸ’Ž **Agent Outputs** | Raw structured JSON output from each of the 7 agents |
| βš™οΈ **Pipeline Logs** | Timestamped execution log + JSON run summary |
---
## 10. Tech Stack
| Package | Purpose |
|---|---|
| **CrewAI** β‰₯ 0.86.0 | Multi-agent orchestration framework |
| **OpenAI** β‰₯ 1.0.0 | LLM API β€” GPT-4o + GPT-4o-mini |
| **Gradio** β‰₯ 5.0.0 | Web UI |
| **pdfplumber** β‰₯ 0.11.0 | PDF text extraction |
| **Pydantic** β‰₯ 2.0.0 | Structured output validation |
| **python-dotenv** β‰₯ 1.0.0 | `.env` file loading |
| **requests** β‰₯ 2.31.0 | HTTP calls to Semantic Scholar / OpenAlex |
---
## 11. Project Structure
```
Homework5_agentincAI/
β”œβ”€β”€ app.py # Main pipeline + Gradio UI (1045 lines)
β”œβ”€β”€ requirements.txt # Dependencies
β”œβ”€β”€ .env # OPENAI_API_KEY
β”‚
β”œβ”€β”€ agents/ # CrewAI agent definitions
β”‚ β”œβ”€β”€ paper_extractor.py # Step 1 β€” GPT-4o
β”‚ β”œβ”€β”€ methodology_critic.py # Step 2a β€” GPT-4o-mini
β”‚ β”œβ”€β”€ relevance_researcher.py # Step 2b β€” GPT-4o-mini
β”‚ β”œβ”€β”€ review_synthesizer.py # Step 3 β€” GPT-4o-mini
β”‚ β”œβ”€β”€ rubric_evaluator.py # Step 4 β€” GPT-4o-mini
β”‚ └── enhancer.py # Step 5 β€” GPT-4o-mini
β”‚
β”œβ”€β”€ tools/ # Custom tools
β”‚ β”œβ”€β”€ pdf_parser.py # PDF β†’ text
β”‚ β”œβ”€β”€ pii_detector.py # PII scan & redact
β”‚ β”œβ”€β”€ injection_scanner.py # Prompt injection detection
β”‚ β”œβ”€β”€ url_validator.py # URL blocklist check
β”‚ └── citation_search.py # Semantic Scholar / OpenAlex
β”‚
└── schemas/
└── models.py # All 8 Pydantic schemas
```
---
## 12. How to Run
```bash
# 1. Install dependencies
pip install -r requirements.txt
# 2. Set your OpenAI API key in .env
echo "OPENAI_API_KEY=your-key-here" > .env
# 3. Launch the app
python app.py
```
Open **http://localhost:7860** β†’ Upload a PDF β†’ Click **"Analyze Paper"** β†’ Wait 1–3 minutes β†’ Review across all 6 tabs.
---
*AI Research Paper Analyst β€” Homework 5, Agentic AI Bootcamp*