Adarsh921's picture
updated readme file
32c548c verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: ArXiv Paper Recommender
emoji: 🧠
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 4.37.2
app_file: app.py
pinned: false

🧠 ArXiv Paper Recommender System

A hybrid AI-powered system to recommend relevant ArXiv research papers for any user query.
It combines semantic search (transformers), keyword-based retrieval (TF-IDF), and reranking for accurate and meaningful results.


πŸ” Overview

This system helps researchers and students quickly find academic papers related to any topic or question.
By leveraging both dense embeddings and sparse features, it provides high-quality, context-aware recommendations from the ArXiv collection.


βš™οΈ Features

  • Semantic + keyword hybrid retrieval
  • FAISS-based fast vector search
  • Pseudo-relevance feedback for query expansion
  • Cross-encoder reranking for precision results
  • Clean Gradio-based web interface
  • Direct links to paper abstracts and PDFs

🧩 Tech Stack

  • FAISS β€” Vector similarity search
  • SentenceTransformer β€” Query embeddings (multi-qa-mpnet-base-dot-v1)
  • TF-IDF Vectorizer β€” Keyword-based matching
  • CrossEncoder β€” Neural reranking (cross-encoder/ms-marco-MiniLM-L-6-v2)
  • Hugging Face Hub β€” Dataset & model hosting
  • Gradio β€” Interactive web UI

🧠 How It Works

  1. Query Embedding: Your query is converted into a vector using a transformer model.
  2. Semantic Retrieval: FAISS retrieves the most similar papers by dense similarity.
  3. Keyword Matching: TF-IDF adds text-based similarity.
  4. Hybrid Fusion: A weighted combination of both scores.
  5. Pseudo-Relevance Feedback: Top keywords expand the query to refine results.
  6. Reranking: Cross-encoder reorders papers based on semantic relevance.
  7. Display: Gradio shows the final ranked papers with abstracts and links.

πŸ“¦ Project Structure

β”œβ”€β”€ app.py               # Main Gradio app
β”œβ”€β”€ requirements.txt     # Dependencies
└── README.md            # Documentation

🧾 Example Query

Input:

β€œTransformers for Computer Vision”

Output:
Top 5 most relevant ArXiv papers with title, authors, abstract, and PDF links.


πŸ“š Dataset & Models

All required files and models are hosted on Hugging Face:

Adarsh921/paper-recommender-data


πŸ‘¨β€πŸ’» Author

Adarsh Bhardwaj
Student Research Project β€” AI & NLP Enthusiast
Built using open-source tools and Hugging Face APIs.


🏁 Deployment

Deployed on Hugging Face Spaces using Gradio.
Simply enter your research topic to start discovering relevant papers instantly.