--- title: Binary Doc Classifier (Chunked) emoji: 📄 colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: false license: mit --- # Binary Document Classifier — Gradio Space This Space hosts a Gradio app for **binary text classification** on uploaded documents. It supports long documents by **chunking** (512-token windows with overlap) and aggregates chunk probabilities into a **document-level** prediction. ## Configuration Set the following **Space variables** in the UI (Settings → Variables): - `MODEL_ID` — your trained model repo (e.g., `your-username/bert-binclass`) - `MAX_LENGTH` — tokens per chunk (default: `512`) - `STRIDE` — overlap tokens between chunks (default: `128`) ## Local run ```bash pip install -r requirements.txt python app.py ``` ## Notes - PDF extraction uses `pypdf` for simplicity.