Spaces:
Sleeping
Sleeping
| title: arXiv Topic Classifier | |
| emoji: π | |
| colorFrom: indigo | |
| colorTo: pink | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| license: mit | |
| # arXiv Topic Classifier | |
| A small web app that takes a paper's **title** and (optionally) **abstract** and predicts the most likely arXiv top-level categories (cs, math, physics, q-bio, stat, ...). | |
| The model is a fine-tuned `distilbert-base-uncased`. Predictions are displayed as a top-95% list β the smallest set of categories whose total probability is at least 95%, sorted by descending confidence. | |
| ## Files | |
| - [app.py](app.py) β Streamlit UI and inference code. | |
| - [train.ipynb](train.ipynb) β End-to-end training notebook (data loading, fine-tuning, evaluation, model saving). | |
| - [requirements.txt](requirements.txt) β Python dependencies for HuggingFace Spaces. | |
| - [PROJECT.md](PROJECT.md) β Detailed project write-up (data, model choices, experiments, results). | |
| ## Run locally | |
| ```bash | |
| pip install -r requirements.txt | |
| # Either: train your own model with train.ipynb (produces ./model/) | |
| # Or: set ARXIV_MODEL_REPO=your-username/arxiv-topic-classifier | |
| streamlit run app.py | |
| ``` | |
| The app and the training notebook auto-detect the best available device: **MPS** (Apple Silicon) β **CUDA** β **CPU**. On an M1 Max one inference call takes ~30β80 ms; on the HF Spaces free tier (CPU) ~150β300 ms. | |
| ## Deploy to HuggingFace Spaces | |
| 1. Create a new Space at https://huggingface.co/new-space, choose **Streamlit** as the SDK. | |
| 2. Push this directory (`app.py`, `requirements.txt`, `README.md`) to the Space's git repo. | |
| 3. **Either** push the trained `./model/` directory alongside the code, **or** publish your model to HF Hub and add a Space secret named `ARXIV_MODEL_REPO` with the repo id (e.g. `your-username/arxiv-topic-classifier`). | |
| 4. The Space will rebuild automatically (β 2β4 minutes). Once it's green, your app is live. | |
| ## Model loading priority | |
| 1. If the env var `ARXIV_MODEL_REPO` is set, the app loads weights from that HF Hub repo. | |
| 2. Otherwise it looks for a local `./model/` directory (the one produced by `train.ipynb`). | |
| 3. If neither is available, the app shows a friendly error explaining what to do. | |