File size: 872 Bytes
4c97017
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
---
title: Binary Doc Classifier (Chunked)
emoji: πŸ“„
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
---

# Binary Document Classifier β€” Gradio Space

This Space hosts a Gradio app for **binary text classification** on uploaded documents.
It supports long documents by **chunking** (512-token windows with overlap) and aggregates
chunk probabilities into a **document-level** prediction.

## Configuration

Set the following **Space variables** in the UI (Settings β†’ Variables):

- `MODEL_ID` β€” your trained model repo (e.g., `your-username/bert-binclass`)
- `MAX_LENGTH` β€” tokens per chunk (default: `512`)
- `STRIDE` β€” overlap tokens between chunks (default: `128`)

## Local run

```bash
pip install -r requirements.txt
python app.py
```

## Notes

- PDF extraction uses `pypdf` for simplicity.