tomerz14 commited on
Commit
4c97017
Β·
verified Β·
1 Parent(s): 988dff5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Binary Doc Classifier (Chunked)
3
+ emoji: πŸ“„
4
+ colorFrom: indigo
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # Binary Document Classifier β€” Gradio Space
14
+
15
+ This Space hosts a Gradio app for **binary text classification** on uploaded documents.
16
+ It supports long documents by **chunking** (512-token windows with overlap) and aggregates
17
+ chunk probabilities into a **document-level** prediction.
18
+
19
+ ## Configuration
20
+
21
+ Set the following **Space variables** in the UI (Settings β†’ Variables):
22
+
23
+ - `MODEL_ID` β€” your trained model repo (e.g., `your-username/bert-binclass`)
24
+ - `MAX_LENGTH` β€” tokens per chunk (default: `512`)
25
+ - `STRIDE` β€” overlap tokens between chunks (default: `128`)
26
+
27
+ ## Local run
28
+
29
+ ```bash
30
+ pip install -r requirements.txt
31
+ python app.py
32
+ ```
33
+
34
+ ## Notes
35
+
36
+ - PDF extraction uses `pypdf` for simplicity.