File size: 872 Bytes
4c97017 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
---
title: Binary Doc Classifier (Chunked)
emoji: π
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
---
# Binary Document Classifier β Gradio Space
This Space hosts a Gradio app for **binary text classification** on uploaded documents.
It supports long documents by **chunking** (512-token windows with overlap) and aggregates
chunk probabilities into a **document-level** prediction.
## Configuration
Set the following **Space variables** in the UI (Settings β Variables):
- `MODEL_ID` β your trained model repo (e.g., `your-username/bert-binclass`)
- `MAX_LENGTH` β tokens per chunk (default: `512`)
- `STRIDE` β overlap tokens between chunks (default: `128`)
## Local run
```bash
pip install -r requirements.txt
python app.py
```
## Notes
- PDF extraction uses `pypdf` for simplicity.
|