Spaces:
Sleeping
Sleeping
metadata
title: arXiv Topic Classifier
emoji: π
colorFrom: indigo
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
license: mit
arXiv Topic Classifier
A small web app that takes a paper's title and (optionally) abstract and predicts the most likely arXiv top-level categories (cs, math, physics, q-bio, stat, ...).
The model is a fine-tuned distilbert-base-uncased. Predictions are displayed as a top-95% list β the smallest set of categories whose total probability is at least 95%, sorted by descending confidence.
Files
- app.py β Streamlit UI and inference code.
- train.ipynb β End-to-end training notebook (data loading, fine-tuning, evaluation, model saving).
- requirements.txt β Python dependencies for HuggingFace Spaces.
- PROJECT.md β Detailed project write-up (data, model choices, experiments, results).
Run locally
pip install -r requirements.txt
# Either: train your own model with train.ipynb (produces ./model/)
# Or: set ARXIV_MODEL_REPO=your-username/arxiv-topic-classifier
streamlit run app.py
The app and the training notebook auto-detect the best available device: MPS (Apple Silicon) β CUDA β CPU. On an M1 Max one inference call takes ~30β80 ms; on the HF Spaces free tier (CPU) ~150β300 ms.
Deploy to HuggingFace Spaces
- Create a new Space at https://huggingface.co/new-space, choose Streamlit as the SDK.
- Push this directory (
app.py,requirements.txt,README.md) to the Space's git repo. - Either push the trained
./model/directory alongside the code, or publish your model to HF Hub and add a Space secret namedARXIV_MODEL_REPOwith the repo id (e.g.your-username/arxiv-topic-classifier). - The Space will rebuild automatically (β 2β4 minutes). Once it's green, your app is live.
Model loading priority
- If the env var
ARXIV_MODEL_REPOis set, the app loads weights from that HF Hub repo. - Otherwise it looks for a local
./model/directory (the one produced bytrain.ipynb). - If neither is available, the app shows a friendly error explaining what to do.