tomerz14's picture
Create README.md
4c97017 verified
|
raw
history blame
872 Bytes
---
title: Binary Doc Classifier (Chunked)
emoji: πŸ“„
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
---
# Binary Document Classifier β€” Gradio Space
This Space hosts a Gradio app for **binary text classification** on uploaded documents.
It supports long documents by **chunking** (512-token windows with overlap) and aggregates
chunk probabilities into a **document-level** prediction.
## Configuration
Set the following **Space variables** in the UI (Settings β†’ Variables):
- `MODEL_ID` β€” your trained model repo (e.g., `your-username/bert-binclass`)
- `MAX_LENGTH` β€” tokens per chunk (default: `512`)
- `STRIDE` β€” overlap tokens between chunks (default: `128`)
## Local run
```bash
pip install -r requirements.txt
python app.py
```
## Notes
- PDF extraction uses `pypdf` for simplicity.