YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Review Knowledge Base
A RAG (Retrieval-Augmented Generation) system that scrapes academic paper reviews from OpenReview and builds a searchable vector database. Designed for AI agents to perform Precedent Checks โ verifying whether a criticism of a paper aligns with historical review norms.
What's Inside
| File | Purpose |
|---|---|
01_scrape_openreview.py |
Scrape submissions, decisions, meta-reviews, and reviews from OpenReview |
02_build_chromadb.py |
Build a ChromaDB vector database (embed Title + Abstract) |
03_push_to_hf.py |
Push/pull data to HuggingFace Datasets for versioning and sharing |
04_query_rag.py |
Query the database by semantic similarity (interactive or CLI) |
05_agent_integration.py |
ReviewerRAG class for programmatic Agent integration |
config.py |
Central configuration (venues, credentials, paths) |
run_pipeline.sh |
One-click pipeline: scrape โ build โ push |
Supported Conferences
- ICLR 2023โ2026
- NeurIPS 2023โ2026
- ICML 2023โ2025
- ACL ARR 2025 (monthly cycles: Feb, Apr, Jun, Aug, Oct, Dec)
Quick Start
1. Install Dependencies
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
2. Set Credentials (Optional)
export OPENREVIEW_USERNAME="your-email@example.com"
export OPENREVIEW_PASSWORD="your-password"
export HF_TOKEN="hf_xxxxxxx"
3. Run the Pipeline
# Full pipeline
./run_pipeline.sh
# Or step by step
python 01_scrape_openreview.py
python 02_build_chromadb.py --require-decision --require-meta-review --rebuild
4. Query
# Interactive
python 04_query_rag.py
# Single query
python 04_query_rag.py --query "using TTS synthetic data for training"
# Precedent check
python 04_query_rag.py --query "weak evaluation methodology" --check
# JSON output (for agents)
python 04_query_rag.py --query "..." --json
5. Agent Integration (Python)
from agent_integration import ReviewerRAG
rag = ReviewerRAG()
result = rag.check_criticism(
paper_abstract="We propose a novel TTS model using synthetic data...",
criticism="The paper relies on synthetic data which is unreliable."
)
print(result["is_valid_criticism"]) # True/False
print(result["suggestion"]) # How to revise the criticism
Data Schema
Each record contains:
| Field | Description |
|---|---|
title |
Paper title |
abstract |
Paper abstract |
decision |
Accept (Poster/Spotlight/Oral) or Reject |
meta_review |
Area Chair's summary and final judgment |
reviews |
List of reviewer ratings + comments |
keywords |
Paper keywords |
venue / year |
Conference name and year |
Using with HuggingFace
# Push data
python 03_push_to_hf.py --require-decision
# Pull data on another machine
python 03_push_to_hf.py --pull
License
MIT
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support