scientific-rag / README.md
DenysKovalML's picture
Prepare for HF Spaces deployment
7ef380f

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: Scientific RAG System
emoji: πŸ”¬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: demo/main.py
pinned: false
license: mit
tags:
  - rag
  - scientific-papers
  - arxiv
  - pubmed
  - retrieval
  - question-answering
python_version: 3.11

πŸ”¬ Scientific RAG System

Retrieval-Augmented Generation for Scientific Papers

This system allows you to ask questions about scientific papers from ArXiv and PubMed datasets, receiving answers with citations from source documents.

🌟 Features

  • πŸ” Hybrid Retrieval: Combines BM25 (keyword-based) and Dense (semantic) search
  • 🎯 Query Processing: Self-query metadata extraction and query expansion
  • πŸ“Š Reranking: Cross-encoder model for improved relevance
  • 🏷️ Metadata Filtering: Filter by source and section type
  • πŸ“ Citations: Answers include source citations
  • 🎨 Interactive UI: Configurable pipeline components

πŸš€ Quick Start

  1. Enter your API key (OpenRouter or Groq) in the settings
  2. Type your question about scientific papers
  3. Adjust retrieval settings if needed
  4. Get answers with citations!

πŸ”‘ API Keys

Get free API keys from:

  • OpenRouter - Multiple free models available
  • Groq - Fast inference with free tier

πŸ—οΈ Architecture

Query β†’ Processing β†’ Hybrid Retrieval β†’ Reranking β†’ LLM Generation β†’ Answer + Citations

πŸ“š Dataset

Uses the armanc/scientific_papers dataset (ArXiv subset).

πŸ’‘ Example Questions

  • "What are quantum error correction approaches?"
  • "What is plasma confinement in tokamaks?"
  • "When is DNA denaturation?"

πŸ“„ License

MIT License

πŸ”— Links