davanstrien's picture
davanstrien HF Staff
Upload folder using huggingface_hub
4cf63e7 verified
---
title: ArXiv New ML Datasets
emoji: 📚
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
license: mit
---
# ArXiv New ML Datasets
Browse **1.1M+ CS papers** from arXiv, with **50,000+ classified** as introducing new machine learning datasets.
## Features
- **Keyword search** - Search titles and abstracts
- **Semantic search** - Find conceptually similar papers using vector embeddings
- **Filter** by arXiv category (cs.AI, cs.CV, cs.LG, etc.)
- **Infinite scroll** for smooth browsing
- Links to arXiv, PDF, and HF Papers
## Data Source
Papers classified using [ModernBERT](https://huggingface.co/davanstrien/ModernBERT-base-is-new-arxiv-dataset). Embeddings from [BGE-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5).
Data from [librarian-bots/arxiv-cs-papers-lance](https://huggingface.co/datasets/librarian-bots/arxiv-cs-papers-lance). Updated weekly.
## Tech Stack
- **Backend**: FastAPI + Polars + Lance
- **Frontend**: HTMX + Tailwind CSS
- **Vector Search**: Lance with IVF_PQ index