YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Review Knowledge Base

A RAG (Retrieval-Augmented Generation) system that scrapes academic paper reviews from OpenReview and builds a searchable vector database. Designed for AI agents to perform Precedent Checks โ€” verifying whether a criticism of a paper aligns with historical review norms.

What's Inside

File Purpose
01_scrape_openreview.py Scrape submissions, decisions, meta-reviews, and reviews from OpenReview
02_build_chromadb.py Build a ChromaDB vector database (embed Title + Abstract)
03_push_to_hf.py Push/pull data to HuggingFace Datasets for versioning and sharing
04_query_rag.py Query the database by semantic similarity (interactive or CLI)
05_agent_integration.py ReviewerRAG class for programmatic Agent integration
config.py Central configuration (venues, credentials, paths)
run_pipeline.sh One-click pipeline: scrape โ†’ build โ†’ push

Supported Conferences

  • ICLR 2023โ€“2026
  • NeurIPS 2023โ€“2026
  • ICML 2023โ€“2025
  • ACL ARR 2025 (monthly cycles: Feb, Apr, Jun, Aug, Oct, Dec)

Quick Start

1. Install Dependencies

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2. Set Credentials (Optional)

export OPENREVIEW_USERNAME="your-email@example.com"
export OPENREVIEW_PASSWORD="your-password"
export HF_TOKEN="hf_xxxxxxx"

3. Run the Pipeline

# Full pipeline
./run_pipeline.sh

# Or step by step
python 01_scrape_openreview.py
python 02_build_chromadb.py --require-decision --require-meta-review --rebuild

4. Query

# Interactive
python 04_query_rag.py

# Single query
python 04_query_rag.py --query "using TTS synthetic data for training"

# Precedent check
python 04_query_rag.py --query "weak evaluation methodology" --check

# JSON output (for agents)
python 04_query_rag.py --query "..." --json

5. Agent Integration (Python)

from agent_integration import ReviewerRAG

rag = ReviewerRAG()
result = rag.check_criticism(
    paper_abstract="We propose a novel TTS model using synthetic data...",
    criticism="The paper relies on synthetic data which is unreliable."
)
print(result["is_valid_criticism"])  # True/False
print(result["suggestion"])          # How to revise the criticism

Data Schema

Each record contains:

Field Description
title Paper title
abstract Paper abstract
decision Accept (Poster/Spotlight/Oral) or Reject
meta_review Area Chair's summary and final judgment
reviews List of reviewer ratings + comments
keywords Paper keywords
venue / year Conference name and year

Using with HuggingFace

# Push data
python 03_push_to_hf.py --require-decision

# Pull data on another machine
python 03_push_to_hf.py --pull

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support