Dmitry057's picture
Add README.md
c95658d verified
metadata
title: arXiv Topic Classifier
emoji: πŸ“‘
colorFrom: indigo
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
license: mit

arXiv Topic Classifier

A small web app that takes a paper's title and (optionally) abstract and predicts the most likely arXiv top-level categories (cs, math, physics, q-bio, stat, ...).

The model is a fine-tuned distilbert-base-uncased. Predictions are displayed as a top-95% list β€” the smallest set of categories whose total probability is at least 95%, sorted by descending confidence.

Files

  • app.py β€” Streamlit UI and inference code.
  • train.ipynb β€” End-to-end training notebook (data loading, fine-tuning, evaluation, model saving).
  • requirements.txt β€” Python dependencies for HuggingFace Spaces.
  • PROJECT.md β€” Detailed project write-up (data, model choices, experiments, results).

Run locally

pip install -r requirements.txt
# Either: train your own model with train.ipynb (produces ./model/)
# Or:    set ARXIV_MODEL_REPO=your-username/arxiv-topic-classifier
streamlit run app.py

The app and the training notebook auto-detect the best available device: MPS (Apple Silicon) β†’ CUDA β†’ CPU. On an M1 Max one inference call takes ~30–80 ms; on the HF Spaces free tier (CPU) ~150–300 ms.

Deploy to HuggingFace Spaces

  1. Create a new Space at https://huggingface.co/new-space, choose Streamlit as the SDK.
  2. Push this directory (app.py, requirements.txt, README.md) to the Space's git repo.
  3. Either push the trained ./model/ directory alongside the code, or publish your model to HF Hub and add a Space secret named ARXIV_MODEL_REPO with the repo id (e.g. your-username/arxiv-topic-classifier).
  4. The Space will rebuild automatically (β‰ˆ 2–4 minutes). Once it's green, your app is live.

Model loading priority

  1. If the env var ARXIV_MODEL_REPO is set, the app loads weights from that HF Hub repo.
  2. Otherwise it looks for a local ./model/ directory (the one produced by train.ipynb).
  3. If neither is available, the app shows a friendly error explaining what to do.