Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
metadata
title: Scientific RAG System
emoji: π¬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: demo/main.py
pinned: false
license: mit
tags:
- rag
- scientific-papers
- arxiv
- pubmed
- retrieval
- question-answering
python_version: 3.11
π¬ Scientific RAG System
Retrieval-Augmented Generation for Scientific Papers
This system allows you to ask questions about scientific papers from ArXiv and PubMed datasets, receiving answers with citations from source documents.
π Features
- π Hybrid Retrieval: Combines BM25 (keyword-based) and Dense (semantic) search
- π― Query Processing: Self-query metadata extraction and query expansion
- π Reranking: Cross-encoder model for improved relevance
- π·οΈ Metadata Filtering: Filter by source and section type
- π Citations: Answers include source citations
- π¨ Interactive UI: Configurable pipeline components
π Quick Start
- Enter your API key (OpenRouter or Groq) in the settings
- Type your question about scientific papers
- Adjust retrieval settings if needed
- Get answers with citations!
π API Keys
Get free API keys from:
- OpenRouter - Multiple free models available
- Groq - Fast inference with free tier
ποΈ Architecture
Query β Processing β Hybrid Retrieval β Reranking β LLM Generation β Answer + Citations
π Dataset
Uses the armanc/scientific_papers dataset (ArXiv subset).
π‘ Example Questions
- "What are quantum error correction approaches?"
- "What is plasma confinement in tokamaks?"
- "When is DNA denaturation?"
π License
MIT License