Spaces:

Dmitry057
/

arxiv-topic-classifier

Sleeping

App Files Files Community

arxiv-topic-classifier / README.md

Dmitry057

Add README.md

c95658d verified about 2 months ago

preview code

raw

history blame contribute delete

2.18 kB

metadata

title: arXiv Topic Classifier
emoji: 📑
colorFrom: indigo
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
license: mit

arXiv Topic Classifier

A small web app that takes a paper's title and (optionally) abstract and predicts the most likely arXiv top-level categories (cs, math, physics, q-bio, stat, ...).

The model is a fine-tuned distilbert-base-uncased. Predictions are displayed as a top-95% list — the smallest set of categories whose total probability is at least 95%, sorted by descending confidence.

Files

app.py — Streamlit UI and inference code.
train.ipynb — End-to-end training notebook (data loading, fine-tuning, evaluation, model saving).
requirements.txt — Python dependencies for HuggingFace Spaces.
PROJECT.md — Detailed project write-up (data, model choices, experiments, results).

Run locally

pip install -r requirements.txt
# Either: train your own model with train.ipynb (produces ./model/)
# Or:    set ARXIV_MODEL_REPO=your-username/arxiv-topic-classifier
streamlit run app.py

The app and the training notebook auto-detect the best available device: MPS (Apple Silicon) → CUDA → CPU. On an M1 Max one inference call takes ~30–80 ms; on the HF Spaces free tier (CPU) ~150–300 ms.

Deploy to HuggingFace Spaces

Create a new Space at https://huggingface.co/new-space, choose Streamlit as the SDK.
Push this directory (app.py, requirements.txt, README.md) to the Space's git repo.
Either push the trained ./model/ directory alongside the code, or publish your model to HF Hub and add a Space secret named ARXIV_MODEL_REPO with the repo id (e.g. your-username/arxiv-topic-classifier).
The Space will rebuild automatically (≈ 2–4 minutes). Once it's green, your app is live.

Model loading priority

If the env var ARXIV_MODEL_REPO is set, the app loads weights from that HF Hub repo.
Otherwise it looks for a local ./model/ directory (the one produced by train.ipynb).
If neither is available, the app shows a friendly error explaining what to do.