chatpaper / README.md
Shafagh99's picture
fix bug in readme
7779a53

A newer version of the Streamlit SDK is available: 1.57.0

Upgrade
metadata
title: ChatPaper
emoji: πŸ”¬
colorFrom: indigo
colorTo: purple
sdk: streamlit
sdk_version: 1.43.0
app_file: src/ui/app.py
pinned: false

ChatPaper β€” Research Assistant Powered by RAG and AI Agents

Find the Project: HuggingFace Space GitHub

ChatPaper is a local AI research assistant that lets you have a conversation with your academic papers. Upload PDFs, ask questions in plain English, and get cited, grounded answers and also find relevant paper! Built with production-grade RAG architecture, a LangGraph agent, and automated quality evaluation.


Tech Stack

Python Streamlit LangGraph LlamaIndex ChromaDB HuggingFace OpenRouter RAGAS


What It Does

Most AI tools answer questions from their training data. ChatPaper answers from your documents. Every answer is grounded in specific pages of your uploaded papers, with citations included.

The system uses two retrieval strategies depending on the complexity of your question. Simple factual questions (authors, datasets, numbers) run a fast semantic search over the most relevant chunks. Complex questions (methodology, contributions, comparisons) send the entire paper content to the model for a thorough, structured answer.


Features

Core Capabilities

  • Upload one or many PDF papers and index them locally in 1–3 minutes
  • Ask questions in plain English and receive cited answers
  • Automatic mode switching. quick semantic search for factual questions, full-paper mode for complex ones
  • Select which papers to chat with, query one paper, a subset, or all at once
  • Multi-turn conversation with memory, follow-up questions work naturally

Paper Discovery

  • Import papers directly from any arXiv URL
  • Auto-suggest related papers from arXiv after indexing, based on extracted keywords
  • Search arXiv's full database of 2M+ papers from within the app
  • One-click download and immediate indexing of any arXiv paper

Quality Evaluation

  • RAGAS evaluation after every answer
  • Three metrics scored automatically: Faithfulness, Answer Relevancy, Context Precision
  • Scores saved alongside chat history and reloaded when you revisit past conversations

Chat Management

  • Conversations auto-saved after every message
  • Full chat history with load and delete, restores the exact paper selection used
  • Duplicate paper detection prevents re-indexing existing papers

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Streamlit UI                β”‚
β”‚         Upload Β· Chat Β· Find Papers          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚    Query Router         β”‚
          β”‚  Simple? β†’ RAG Search   β”‚
          β”‚  Complex? β†’ Full Paper  β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚       LlamaIndex RAG        β”‚
        β”‚  Chunk Β· Embed Β· Retrieve   β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚          ChromaDB           β”‚
        β”‚   Persistent Vector Store   β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚     Claude via OpenRouter   β”‚
        β”‚       Answer Generation     β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚       RAGAS Evaluation      β”‚
        β”‚  Faithfulness Β· Relevancy   β”‚
        β”‚     Context Precision       β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Project Structure

chatpaper/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ ingestion/
β”‚   β”‚   β”œβ”€β”€ pdf_loader.py        # PDF text extraction via PyMuPDF (page-by-page)
β”‚   β”‚   └── paper_fetcher.py     # arXiv API integration, keyword extraction, download
β”‚   β”œβ”€β”€ rag/
β”‚   β”‚   └── pipeline.py          # LlamaIndex RAG pipeline + ChromaDB + full-paper mode
β”‚   β”œβ”€β”€ agent/
β”‚   β”‚   β”œβ”€β”€ tools.py             # LangChain tools: search, compare, literature review
β”‚   β”‚   └── agent.py             # LangGraph ReAct agent with conversation memory
β”‚   β”œβ”€β”€ evaluation/
β”‚   β”‚   └── ragas_eval.py        # RAGAS metrics: faithfulness, relevancy, precision
β”‚   └── ui/
β”‚       └── app.py               # Streamlit interface: chat, sidebar, history, find papers
β”œβ”€β”€ chroma_db/                   # Auto-created: vector embeddings + metadata (persisted)
β”œβ”€β”€ chats/                       # Auto-created: JSON conversation history
β”œβ”€β”€ data/                        # Auto-created: downloaded arXiv PDFs
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
└── .gitignore

How It Works

Indexing (runs once per paper)

When you upload a PDF, PyMuPDF extracts the text page by page. LlamaIndex splits the text into 256-token chunks with 50-token overlaps, keeping context intact across chunk boundaries. Each chunk is converted into a 384-dimensional embedding vector using the BAAI/bge-small-en-v1.5 model running locally. The vectors and their metadata (filename, page number) are stored in ChromaDB on disk and persist across restarts.

Querying (every question)

The system first classifies your question. If it detects complexity keywords (explain, methodology, summarize, compare, etc.), it fetches all chunks from the selected paper and sends the complete content to the model. For simpler questions, it embeds the question into a vector, searches ChromaDB for the top-k most semantically similar chunks, and sends only those chunks as context. Either way, the model generates an answer grounded exclusively in the retrieved text.

Evaluation (optional, per answer)

When RAGAS evaluation is enabled, three additional LLM calls assess the answer quality. Faithfulness checks whether every claim in the answer is supported by the retrieved context. Answer Relevancy generates reverse questions from the answer and measures their cosine similarity to the original question. Context Precision judges whether the retrieved chunks were actually useful for answering the question.


Getting Started

Requirements: Python 3.10+, an OpenRouter API key (models from ~$0.25/1M tokens)

# Clone and enter the project
git clone https://github.com/ShafaghRastegari/chatpaper.git
cd chatpaper

# Create a virtual environment
python -m venv .venv
.venv\Scripts\activate          # Windows
# source .venv/bin/activate     # macOS/Linux

# Install dependencies
pip install -r requirements.txt

# Configure your API key
cp .env.example .env
# Open .env and set OPENROUTER_API_KEY and OPENROUTER_MODEL

# Run the app
streamlit run src/ui/app.py

Open http://localhost:8501 and upload your first paper.


Configuration

# .env
# Fixes a protobuf conflict on Windows
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python

Evaluation Results

Tested on a 17-page NLP research paper with a complex question set:

Metric Score Interpretation
Faithfulness 1.0 Zero hallucinations β€” all claims supported by source text
Answer Relevancy 0.59 Answers address the question with some additional context
Context Precision 0.0 Known RAGAS limitation without ground-truth labels

Faithfulness of 1.0 is the critical metric for a research assistant, it means the system never invents information that isn't in the paper.


Design Decisions

Why local embeddings? Running BAAI/bge-small-en-v1.5 locally means no second API key, no cost per embedding, and complete privacy, your paper content never leaves your machine during indexing.

Why two retrieval modes? Top-k semantic search is fast and cheap for factual lookups. But questions about methodology or contributions require understanding the paper holistically, for those, sending all chunks to a 200k-context model is more accurate than hoping the right 5 chunks were retrieved.

Why RAGAS with no ground truth? Most RAG systems ship with no evaluation at all. Even without human-labeled answers, faithfulness evaluation provides a meaningful signal about hallucination risk, which is the most important property for a research tool where accuracy is non-negotiable.


Built With

Layer Technology Role
Interface Streamlit Web UI with chat, sidebar, and tabs
Agent LangGraph + LangChain ReAct agent with tool calling
RAG LlamaIndex Document chunking, embedding, retrieval
Vector store ChromaDB Persistent local vector database
Embeddings BAAI/bge-small-en-v1.5 Free local text-to-vector model
LLM Claude Haiku via OpenRouter Answer generation and reasoning
PDF parsing PyMuPDF Fast, accurate PDF text extraction
Paper search arXiv API Free academic paper search and download
Evaluation RAGAS Automated RAG quality measurement


Deployment

ChatPaper is designed to run both locally and in production with no code changes. For deployment, the app connects to Chroma Cloud for persistent vector storage and HuggingFace Hub as a private dataset repository for chat history, replacing the local chroma_db/ and chats/ folders that are used during development.