Spaces:

AISA-Framework
/

AI-Research-Paper-Analyst

Sleeping

App Files Files Community

AI-Research-Paper-Analyst / walkthrough.md

Saleh

Clean deployment to HuggingFace Space

2447eba 22 days ago

preview code

raw

history blame contribute delete

8.33 kB

	# 🔬 AI Research Paper Analyst — Project Walkthrough

	> Automated Peer-Review System powered by Multi-Agent AI
	>
	> Upload a research paper (PDF) → receive a publication-ready peer review with methodology critique, novelty assessment, rubric scoring, and an Accept / Revise / Reject recommendation.

	---

	## 1. What Does This System Do?

	\| Input \| Output \|
	\|---\|---\|
	\| A single PDF research paper \| A structured peer-review report with strengths, weaknesses, rubric scores, and a recommendation \|

	Key stats:

	- 7 specialized AI agents working in a sequential pipeline
	- 5 custom tools (PDF parsing, PII redaction, injection scanning, URL validation, citation search)
	- 8 Pydantic schemas enforcing structured JSON output from every agent
	- 15-point binary rubric for quality assurance
	- Gradio web UI with 6 tabs for exploring every aspect of the review

	---

	## 2. System Architecture Flowchart

	![System Architecture](docs/images/system_architecture.png)

	---

	## 3. Simplified Pipeline Flow

	![Pipeline Flow](docs/images/pipeline_flow.png)

	---

	## 4. The 7 Agents

	\| # \| Agent \| LLM \| Role \| Key Output \|
	\|---\|---\|---\|---\|---\|
	\| 1 \| 🛡️ Safety Guardian \| None (programmatic) \| Gate — blocks unsafe docs before any LLM sees them \| `SafetyReport` \|
	\| 2 \| 📄 Paper Extractor \| GPT-4o \| Extract title, authors, abstract, methodology, findings \| `PaperExtraction` \|
	\| 3 \| 🔬 Methodology Critic \| GPT-4o-mini \| Evaluate study design, stats, reproducibility \| `MethodologyCritique` \|
	\| 4 \| 🔍 Relevance Researcher \| GPT-4o-mini \| Search Semantic Scholar / OpenAlex for related work \| `RelevanceReport` \|
	\| 5 \| ✍️ Review Synthesizer \| GPT-4o-mini \| Combine all insights into a peer-review draft \| `ReviewDraft` \|
	\| 6 \| 📏 Rubric Evaluator \| GPT-4o-mini \| Score the draft on 15 binary criteria (pass ≥ 11/15) \| `RubricEvaluation` \|
	\| 7 \| ✨ Enhancer \| GPT-4o-mini \| Fix rubric failures, produce publication-ready report \| `FinalReview` \|

	---

	## 5. The 5 Tools

	\| # \| Tool \| File \| Used By \| What It Does \|
	\|---\|---\|---\|---\|---\|
	\| 1 \| 📑 PDF Parser \| `tools/pdf_parser.py` \| Safety Guardian, Paper Extractor \| Extracts text from PDF using `pdfplumber`. Validates file type, existence, and size (≤ 20 MB). \|
	\| 2 \| 🔒 PII Detector \| `tools/pii_detector.py` \| Safety Guardian \| Regex-based scan for emails, phone numbers, SSNs, credit cards. Replaces matches with `[REDACTED_TYPE]`. \|
	\| 3 \| 🚫 Injection Scanner \| `tools/injection_scanner.py` \| Safety Guardian \| Checks text against 9 prompt-injection patterns (e.g. "ignore previous instructions", `[INST]`). Fail-safe: defaults to unsafe if scanning crashes. \|
	\| 4 \| 🌐 URL Validator \| `tools/url_validator.py` \| Safety Guardian \| Extracts URLs via regex, checks against blocklist (bit.ly, tinyurl, `data:`, `javascript:`). Max 50 URLs per scan. \|
	\| 5 \| 🔎 Citation Search \| `tools/citation_search.py` \| Relevance Researcher \| Searches Semantic Scholar (with retry + backoff for rate limits). Falls back to OpenAlex if unavailable. Max 3 API calls per run. \|

	### Tool–Agent Assignment Map

	![Tool-Agent Assignment](docs/images/tool_agent_map.png)

	---

	## 6. Pydantic Schemas (Structured Output)

	Every agent is forced to output validated JSON through Pydantic schemas. If an agent's output doesn't match the schema, CrewAI automatically retries with a correction prompt.

	\| Schema \| Key Fields \|
	\|---\|---\|
	\| `SafetyReport` \| `is_safe`, `pii_found`, `injection_detected`, `malicious_urls`, `risk_level` \|
	\| `PaperExtraction` \| `title`, `authors`, `abstract`, `methodology`, `key_findings`, `paper_type`, `extraction_confidence` \|
	\| `MethodologyCritique` \| `strengths`, `weaknesses`, `methodology_score` (1-10), `reproducibility_score` (1-10), `bias_risks` \|
	\| `RelevanceReport` \| `related_papers[]`, `novelty_score` (1-10), `field_context`, `gaps_addressed` \|
	\| `ReviewDraft` \| `summary`, `strengths_section`, `weaknesses_section`, `recommendation` (Accept/Revise/Reject) \|
	\| `RubricEvaluation` \| `scores{}` (15 binary criteria), `total_score` (0–15), `passed` (≥ 11) \|
	\| `FinalReview` \| `executive_summary`, `strengths`, `weaknesses`, `recommendation`, `confidence_score`, `improvement_log` \|

	---

	## 7. Safety & Guardrails — 5 Layers

	![5-Layer Safety Architecture](docs/images/safety_layers.png)

	Key principle: The Safety Guardian uses zero LLM calls — all safety decisions are deterministic regex/logic. This prevents prompt injection attacks from manipulating the safety gate itself.

	---

	## 8. Rubric — 15 Binary Criteria

	The Rubric Evaluator scores the review on 15 strict pass/fail criteria (0 or 1 each). A review passes with ≥ 11/15.

	\| # \| Category \| Criterion \|
	\|---\|---\|---\|
	\| 1 \| 📋 Content \| Title & authors correctly identified \|
	\| 2 \| 📋 Content \| Abstract accurately summarized \|
	\| 3 \| 📋 Content \| Methodology clearly described \|
	\| 4 \| 📋 Content \| At least 3 distinct strengths \|
	\| 5 \| 📋 Content \| At least 3 distinct weaknesses \|
	\| 6 \| 📋 Content \| Limitations acknowledged \|
	\| 7 \| 📋 Content \| Related work present (2+ papers) \|
	\| 8 \| 🔬 Depth \| Novelty assessed with justification \|
	\| 9 \| 🔬 Depth \| Reproducibility discussed \|
	\| 10 \| 🔬 Depth \| Evidence quality evaluated \|
	\| 11 \| 🔬 Depth \| Contribution to field stated \|
	\| 12 \| 📝 Quality \| Recommendation justified with evidence \|
	\| 13 \| 📝 Quality \| At least 3 actionable questions \|
	\| 14 \| 📝 Quality \| No hallucinated citations \|
	\| 15 \| 📝 Quality \| Professional tone and coherent structure \|

	---

	## 9. Gradio UI — 6 Tabs

	\| Tab \| What It Shows \|
	\|---\|---\|
	\| 📋 Executive Summary \| Recommendation (Accept/Revise/Reject), confidence, rubric score, paper info + download button \|
	\| 📝 Full Review \| Strengths, weaknesses, methodology & novelty assessments, author questions \|
	\| 📊 Rubric Scorecard \| All 15 criteria with ✅/❌ scores and per-criterion feedback \|
	\| 🛡️ Safety Report \| PII findings, injection scan result, URL analysis \|
	\| 💎 Agent Outputs \| Raw structured JSON output from each of the 7 agents \|
	\| ⚙️ Pipeline Logs \| Timestamped execution log + JSON run summary \|

	---

	## 10. Tech Stack

	\| Package \| Purpose \|
	\|---\|---\|
	\| CrewAI ≥ 0.86.0 \| Multi-agent orchestration framework \|
	\| OpenAI ≥ 1.0.0 \| LLM API — GPT-4o + GPT-4o-mini \|
	\| Gradio ≥ 5.0.0 \| Web UI \|
	\| pdfplumber ≥ 0.11.0 \| PDF text extraction \|
	\| Pydantic ≥ 2.0.0 \| Structured output validation \|
	\| python-dotenv ≥ 1.0.0 \| `.env` file loading \|
	\| requests ≥ 2.31.0 \| HTTP calls to Semantic Scholar / OpenAlex \|

	---

	## 11. Project Structure

	```
	Homework5_agentincAI/
	├── app.py # Main pipeline + Gradio UI (1045 lines)
	├── requirements.txt # Dependencies
	├── .env # OPENAI_API_KEY
	│
	├── agents/ # CrewAI agent definitions
	│ ├── paper_extractor.py # Step 1 — GPT-4o
	│ ├── methodology_critic.py # Step 2a — GPT-4o-mini
	│ ├── relevance_researcher.py # Step 2b — GPT-4o-mini
	│ ├── review_synthesizer.py # Step 3 — GPT-4o-mini
	│ ├── rubric_evaluator.py # Step 4 — GPT-4o-mini
	│ └── enhancer.py # Step 5 — GPT-4o-mini
	│
	├── tools/ # Custom tools
	│ ├── pdf_parser.py # PDF → text
	│ ├── pii_detector.py # PII scan & redact
	│ ├── injection_scanner.py # Prompt injection detection
	│ ├── url_validator.py # URL blocklist check
	│ └── citation_search.py # Semantic Scholar / OpenAlex
	│
	└── schemas/
	└── models.py # All 8 Pydantic schemas
	```

	---

	## 12. How to Run

	```bash
	# 1. Install dependencies
	pip install -r requirements.txt

	# 2. Set your OpenAI API key in .env
	echo "OPENAI_API_KEY=your-key-here" > .env

	# 3. Launch the app
	python app.py
	```

	Open http://localhost:7860 → Upload a PDF → Click "Analyze Paper" → Wait 1–3 minutes → Review across all 6 tabs.

	---

	AI Research Paper Analyst — Homework 5, Agentic AI Bootcamp

	# 🔬 AI Research Paper Analyst — Project Walkthrough

	> Automated Peer-Review System powered by Multi-Agent AI
	>
	> Upload a research paper (PDF) → receive a publication-ready peer review with methodology critique, novelty assessment, rubric scoring, and an Accept / Revise / Reject recommendation.

	---

	## 1. What Does This System Do?

	\| Input \| Output \|
	\|---\|---\|
	\| A single PDF research paper \| A structured peer-review report with strengths, weaknesses, rubric scores, and a recommendation \|

	Key stats:

	- 7 specialized AI agents working in a sequential pipeline
	- 5 custom tools (PDF parsing, PII redaction, injection scanning, URL validation, citation search)
	- 8 Pydantic schemas enforcing structured JSON output from every agent
	- 15-point binary rubric for quality assurance
	- Gradio web UI with 6 tabs for exploring every aspect of the review

	---

	## 2. System Architecture Flowchart

	![System Architecture](docs/images/system_architecture.png)

	---

	## 3. Simplified Pipeline Flow

	![Pipeline Flow](docs/images/pipeline_flow.png)

	---

	## 4. The 7 Agents

	\| # \| Agent \| LLM \| Role \| Key Output \|
	\|---\|---\|---\|---\|---\|
	\| 1 \| 🛡️ Safety Guardian \| None (programmatic) \| Gate — blocks unsafe docs before any LLM sees them \| `SafetyReport` \|
	\| 2 \| 📄 Paper Extractor \| GPT-4o \| Extract title, authors, abstract, methodology, findings \| `PaperExtraction` \|
	\| 3 \| 🔬 Methodology Critic \| GPT-4o-mini \| Evaluate study design, stats, reproducibility \| `MethodologyCritique` \|
	\| 4 \| 🔍 Relevance Researcher \| GPT-4o-mini \| Search Semantic Scholar / OpenAlex for related work \| `RelevanceReport` \|
	\| 5 \| ✍️ Review Synthesizer \| GPT-4o-mini \| Combine all insights into a peer-review draft \| `ReviewDraft` \|
	\| 6 \| 📏 Rubric Evaluator \| GPT-4o-mini \| Score the draft on 15 binary criteria (pass ≥ 11/15) \| `RubricEvaluation` \|
	\| 7 \| ✨ Enhancer \| GPT-4o-mini \| Fix rubric failures, produce publication-ready report \| `FinalReview` \|

	---

	## 5. The 5 Tools

	\| # \| Tool \| File \| Used By \| What It Does \|
	\|---\|---\|---\|---\|---\|
	\| 1 \| 📑 PDF Parser \| `tools/pdf_parser.py` \| Safety Guardian, Paper Extractor \| Extracts text from PDF using `pdfplumber`. Validates file type, existence, and size (≤ 20 MB). \|
	\| 2 \| 🔒 PII Detector \| `tools/pii_detector.py` \| Safety Guardian \| Regex-based scan for emails, phone numbers, SSNs, credit cards. Replaces matches with `[REDACTED_TYPE]`. \|
	\| 3 \| 🚫 Injection Scanner \| `tools/injection_scanner.py` \| Safety Guardian \| Checks text against 9 prompt-injection patterns (e.g. "ignore previous instructions", `[INST]`). Fail-safe: defaults to unsafe if scanning crashes. \|
	\| 4 \| 🌐 URL Validator \| `tools/url_validator.py` \| Safety Guardian \| Extracts URLs via regex, checks against blocklist (bit.ly, tinyurl, `data:`, `javascript:`). Max 50 URLs per scan. \|
	\| 5 \| 🔎 Citation Search \| `tools/citation_search.py` \| Relevance Researcher \| Searches Semantic Scholar (with retry + backoff for rate limits). Falls back to OpenAlex if unavailable. Max 3 API calls per run. \|

	### Tool–Agent Assignment Map

	![Tool-Agent Assignment](docs/images/tool_agent_map.png)

	---

	## 6. Pydantic Schemas (Structured Output)

	Every agent is forced to output validated JSON through Pydantic schemas. If an agent's output doesn't match the schema, CrewAI automatically retries with a correction prompt.

	\| Schema \| Key Fields \|
	\|---\|---\|
	\| `SafetyReport` \| `is_safe`, `pii_found`, `injection_detected`, `malicious_urls`, `risk_level` \|
	\| `PaperExtraction` \| `title`, `authors`, `abstract`, `methodology`, `key_findings`, `paper_type`, `extraction_confidence` \|
	\| `MethodologyCritique` \| `strengths`, `weaknesses`, `methodology_score` (1-10), `reproducibility_score` (1-10), `bias_risks` \|
	\| `RelevanceReport` \| `related_papers[]`, `novelty_score` (1-10), `field_context`, `gaps_addressed` \|
	\| `ReviewDraft` \| `summary`, `strengths_section`, `weaknesses_section`, `recommendation` (Accept/Revise/Reject) \|
	\| `RubricEvaluation` \| `scores{}` (15 binary criteria), `total_score` (0–15), `passed` (≥ 11) \|
	\| `FinalReview` \| `executive_summary`, `strengths`, `weaknesses`, `recommendation`, `confidence_score`, `improvement_log` \|

	---

	## 7. Safety & Guardrails — 5 Layers

	![5-Layer Safety Architecture](docs/images/safety_layers.png)

	Key principle: The Safety Guardian uses zero LLM calls — all safety decisions are deterministic regex/logic. This prevents prompt injection attacks from manipulating the safety gate itself.

	---

	## 8. Rubric — 15 Binary Criteria

	The Rubric Evaluator scores the review on 15 strict pass/fail criteria (0 or 1 each). A review passes with ≥ 11/15.

	\| # \| Category \| Criterion \|
	\|---\|---\|---\|
	\| 1 \| 📋 Content \| Title & authors correctly identified \|
	\| 2 \| 📋 Content \| Abstract accurately summarized \|
	\| 3 \| 📋 Content \| Methodology clearly described \|
	\| 4 \| 📋 Content \| At least 3 distinct strengths \|
	\| 5 \| 📋 Content \| At least 3 distinct weaknesses \|
	\| 6 \| 📋 Content \| Limitations acknowledged \|
	\| 7 \| 📋 Content \| Related work present (2+ papers) \|
	\| 8 \| 🔬 Depth \| Novelty assessed with justification \|
	\| 9 \| 🔬 Depth \| Reproducibility discussed \|
	\| 10 \| 🔬 Depth \| Evidence quality evaluated \|
	\| 11 \| 🔬 Depth \| Contribution to field stated \|
	\| 12 \| 📝 Quality \| Recommendation justified with evidence \|
	\| 13 \| 📝 Quality \| At least 3 actionable questions \|
	\| 14 \| 📝 Quality \| No hallucinated citations \|
	\| 15 \| 📝 Quality \| Professional tone and coherent structure \|

	---

	## 9. Gradio UI — 6 Tabs

	\| Tab \| What It Shows \|
	\|---\|---\|
	\| 📋 Executive Summary \| Recommendation (Accept/Revise/Reject), confidence, rubric score, paper info + download button \|
	\| 📝 Full Review \| Strengths, weaknesses, methodology & novelty assessments, author questions \|
	\| 📊 Rubric Scorecard \| All 15 criteria with ✅/❌ scores and per-criterion feedback \|
	\| 🛡️ Safety Report \| PII findings, injection scan result, URL analysis \|
	\| 💎 Agent Outputs \| Raw structured JSON output from each of the 7 agents \|
	\| ⚙️ Pipeline Logs \| Timestamped execution log + JSON run summary \|

	---

	## 10. Tech Stack

	\| Package \| Purpose \|
	\|---\|---\|
	\| CrewAI ≥ 0.86.0 \| Multi-agent orchestration framework \|
	\| OpenAI ≥ 1.0.0 \| LLM API — GPT-4o + GPT-4o-mini \|
	\| Gradio ≥ 5.0.0 \| Web UI \|
	\| pdfplumber ≥ 0.11.0 \| PDF text extraction \|
	\| Pydantic ≥ 2.0.0 \| Structured output validation \|
	\| python-dotenv ≥ 1.0.0 \| `.env` file loading \|
	\| requests ≥ 2.31.0 \| HTTP calls to Semantic Scholar / OpenAlex \|

	---

	## 11. Project Structure

	```
	Homework5_agentincAI/
	├── app.py # Main pipeline + Gradio UI (1045 lines)
	├── requirements.txt # Dependencies
	├── .env # OPENAI_API_KEY
	│
	├── agents/ # CrewAI agent definitions
	│ ├── paper_extractor.py # Step 1 — GPT-4o
	│ ├── methodology_critic.py # Step 2a — GPT-4o-mini
	│ ├── relevance_researcher.py # Step 2b — GPT-4o-mini
	│ ├── review_synthesizer.py # Step 3 — GPT-4o-mini
	│ ├── rubric_evaluator.py # Step 4 — GPT-4o-mini
	│ └── enhancer.py # Step 5 — GPT-4o-mini
	│
	├── tools/ # Custom tools
	│ ├── pdf_parser.py # PDF → text
	│ ├── pii_detector.py # PII scan & redact
	│ ├── injection_scanner.py # Prompt injection detection
	│ ├── url_validator.py # URL blocklist check
	│ └── citation_search.py # Semantic Scholar / OpenAlex
	│
	└── schemas/
	└── models.py # All 8 Pydantic schemas
	```

	---

	## 12. How to Run

	```bash
	# 1. Install dependencies
	pip install -r requirements.txt

	# 2. Set your OpenAI API key in .env
	echo "OPENAI_API_KEY=your-key-here" > .env

	# 3. Launch the app
	python app.py
	```

	Open http://localhost:7860 → Upload a PDF → Click "Analyze Paper" → Wait 1–3 minutes → Review across all 6 tabs.

	---

	AI Research Paper Analyst — Homework 5, Agentic AI Bootcamp