Review Knowledge Base

A RAG (Retrieval-Augmented Generation) system that scrapes academic paper reviews from OpenReview and builds a searchable vector database. Designed for AI agents to perform Precedent Checks — verifying whether a criticism of a paper aligns with historical review norms.

What's Inside

File	Purpose
`01_scrape_openreview.py`	Scrape submissions, decisions, meta-reviews, and reviews from OpenReview
`02_build_chromadb.py`	Build a ChromaDB vector database (embed Title + Abstract)
`03_push_to_hf.py`	Push/pull data to HuggingFace Datasets for versioning and sharing
`04_query_rag.py`	Query the database by semantic similarity (interactive or CLI)
`05_agent_integration.py`	`ReviewerRAG` class for programmatic Agent integration
`config.py`	Central configuration (venues, credentials, paths)
`run_pipeline.sh`	One-click pipeline: scrape → build → push

Supported Conferences

ICLR 2023–2026
NeurIPS 2023–2026
ICML 2023–2025
ACL ARR 2025 (monthly cycles: Feb, Apr, Jun, Aug, Oct, Dec)

Quick Start

1. Install Dependencies

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2. Set Credentials (Optional)

export OPENREVIEW_USERNAME="your-email@example.com"
export OPENREVIEW_PASSWORD="your-password"
export HF_TOKEN="hf_xxxxxxx"

3. Run the Pipeline

# Full pipeline
./run_pipeline.sh

# Or step by step
python 01_scrape_openreview.py
python 02_build_chromadb.py --require-decision --require-meta-review --rebuild

4. Query

# Interactive
python 04_query_rag.py

# Single query
python 04_query_rag.py --query "using TTS synthetic data for training"

# Precedent check
python 04_query_rag.py --query "weak evaluation methodology" --check

# JSON output (for agents)
python 04_query_rag.py --query "..." --json

5. Agent Integration (Python)

from agent_integration import ReviewerRAG

rag = ReviewerRAG()
result = rag.check_criticism(
    paper_abstract="We propose a novel TTS model using synthetic data...",
    criticism="The paper relies on synthetic data which is unreliable."
)
print(result["is_valid_criticism"])  # True/False
print(result["suggestion"])          # How to revise the criticism

Data Schema

Each record contains:

Field	Description
`title`	Paper title
`abstract`	Paper abstract
`decision`	Accept (Poster/Spotlight/Oral) or Reject
`meta_review`	Area Chair's summary and final judgment
`reviews`	List of reviewer ratings + comments
`keywords`	Paper keywords
`venue` / `year`	Conference name and year

Using with HuggingFace

# Push data
python 03_push_to_hf.py --require-decision

# Pull data on another machine
python 03_push_to_hf.py --pull

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support