--- title: arXiv Topic Classifier emoji: 📑 colorFrom: indigo colorTo: pink sdk: docker app_port: 7860 pinned: false license: mit --- # arXiv Topic Classifier A small web app that takes a paper's **title** and (optionally) **abstract** and predicts the most likely arXiv top-level categories (cs, math, physics, q-bio, stat, ...). The model is a fine-tuned `distilbert-base-uncased`. Predictions are displayed as a top-95% list — the smallest set of categories whose total probability is at least 95%, sorted by descending confidence. ## Files - [app.py](app.py) — Streamlit UI and inference code. - [train.ipynb](train.ipynb) — End-to-end training notebook (data loading, fine-tuning, evaluation, model saving). - [requirements.txt](requirements.txt) — Python dependencies for HuggingFace Spaces. - [PROJECT.md](PROJECT.md) — Detailed project write-up (data, model choices, experiments, results). ## Run locally ```bash pip install -r requirements.txt # Either: train your own model with train.ipynb (produces ./model/) # Or: set ARXIV_MODEL_REPO=your-username/arxiv-topic-classifier streamlit run app.py ``` The app and the training notebook auto-detect the best available device: **MPS** (Apple Silicon) → **CUDA** → **CPU**. On an M1 Max one inference call takes ~30–80 ms; on the HF Spaces free tier (CPU) ~150–300 ms. ## Deploy to HuggingFace Spaces 1. Create a new Space at https://huggingface.co/new-space, choose **Streamlit** as the SDK. 2. Push this directory (`app.py`, `requirements.txt`, `README.md`) to the Space's git repo. 3. **Either** push the trained `./model/` directory alongside the code, **or** publish your model to HF Hub and add a Space secret named `ARXIV_MODEL_REPO` with the repo id (e.g. `your-username/arxiv-topic-classifier`). 4. The Space will rebuild automatically (≈ 2–4 minutes). Once it's green, your app is live. ## Model loading priority 1. If the env var `ARXIV_MODEL_REPO` is set, the app loads weights from that HF Hub repo. 2. Otherwise it looks for a local `./model/` directory (the one produced by `train.ipynb`). 3. If neither is available, the app shows a friendly error explaining what to do.