|
|
--- |
|
|
title: Binary Doc Classifier (Chunked) |
|
|
emoji: π |
|
|
colorFrom: indigo |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
sdk_version: 4.44.0 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# Binary Document Classifier β Gradio Space |
|
|
|
|
|
This Space hosts a Gradio app for **binary text classification** on uploaded documents. |
|
|
It supports long documents by **chunking** (512-token windows with overlap) and aggregates |
|
|
chunk probabilities into a **document-level** prediction. |
|
|
|
|
|
## Configuration |
|
|
|
|
|
Set the following **Space variables** in the UI (Settings β Variables): |
|
|
|
|
|
- `MODEL_ID` β your trained model repo (e.g., `your-username/bert-binclass`) |
|
|
- `MAX_LENGTH` β tokens per chunk (default: `512`) |
|
|
- `STRIDE` β overlap tokens between chunks (default: `128`) |
|
|
|
|
|
## Local run |
|
|
|
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
python app.py |
|
|
``` |
|
|
|
|
|
## Notes |
|
|
|
|
|
- PDF extraction uses `pypdf` for simplicity. |
|
|
|