Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
title: ArXiv Paper Recommender
emoji: π§
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 4.37.2
app_file: app.py
pinned: false
π§ ArXiv Paper Recommender System
A hybrid AI-powered system to recommend relevant ArXiv research papers for any user query.
It combines semantic search (transformers), keyword-based retrieval (TF-IDF), and reranking for accurate and meaningful results.
π Overview
This system helps researchers and students quickly find academic papers related to any topic or question.
By leveraging both dense embeddings and sparse features, it provides high-quality, context-aware recommendations from the ArXiv collection.
βοΈ Features
- Semantic + keyword hybrid retrieval
- FAISS-based fast vector search
- Pseudo-relevance feedback for query expansion
- Cross-encoder reranking for precision results
- Clean Gradio-based web interface
- Direct links to paper abstracts and PDFs
π§© Tech Stack
- FAISS β Vector similarity search
- SentenceTransformer β Query embeddings (
multi-qa-mpnet-base-dot-v1) - TF-IDF Vectorizer β Keyword-based matching
- CrossEncoder β Neural reranking (
cross-encoder/ms-marco-MiniLM-L-6-v2) - Hugging Face Hub β Dataset & model hosting
- Gradio β Interactive web UI
π§ How It Works
- Query Embedding: Your query is converted into a vector using a transformer model.
- Semantic Retrieval: FAISS retrieves the most similar papers by dense similarity.
- Keyword Matching: TF-IDF adds text-based similarity.
- Hybrid Fusion: A weighted combination of both scores.
- Pseudo-Relevance Feedback: Top keywords expand the query to refine results.
- Reranking: Cross-encoder reorders papers based on semantic relevance.
- Display: Gradio shows the final ranked papers with abstracts and links.
π¦ Project Structure
βββ app.py # Main Gradio app
βββ requirements.txt # Dependencies
βββ README.md # Documentation
π§Ύ Example Query
Input:
βTransformers for Computer Visionβ
Output:
Top 5 most relevant ArXiv papers with title, authors, abstract, and PDF links.
π Dataset & Models
All required files and models are hosted on Hugging Face:
Adarsh921/paper-recommender-data
π¨βπ» Author
Adarsh Bhardwaj
Student Research Project β AI & NLP Enthusiast
Built using open-source tools and Hugging Face APIs.
π Deployment
Deployed on Hugging Face Spaces using Gradio.
Simply enter your research topic to start discovering relevant papers instantly.