Delete README.md
Browse files
README.md
DELETED
|
@@ -1,27 +0,0 @@
|
|
| 1 |
-
# Binary Document Classifier — Gradio Space
|
| 2 |
-
|
| 3 |
-
This Space hosts a Gradio app for **binary text classification** on uploaded documents.
|
| 4 |
-
It supports long documents by **chunking** (512-token windows with overlap) and aggregates
|
| 5 |
-
chunk probabilities into a **document-level** prediction.
|
| 6 |
-
|
| 7 |
-
## Configure
|
| 8 |
-
|
| 9 |
-
Set the environment variable `MODEL_ID` in your Space to point to your trained model,
|
| 10 |
-
e.g. `your-username/bert-binclass`. You can also set:
|
| 11 |
-
|
| 12 |
-
- `MAX_LENGTH` — tokens per chunk (default: 512)
|
| 13 |
-
- `STRIDE` — overlap tokens between chunks (default: 128)
|
| 14 |
-
|
| 15 |
-
## Run locally
|
| 16 |
-
|
| 17 |
-
```bash
|
| 18 |
-
pip install -r requirements.txt
|
| 19 |
-
python app.py
|
| 20 |
-
```
|
| 21 |
-
|
| 22 |
-
Then open the printed Gradio URL.
|
| 23 |
-
|
| 24 |
-
## Notes
|
| 25 |
-
|
| 26 |
-
- PDF extraction uses `pypdf` for simplicity. For higher-quality results or OCR,
|
| 27 |
-
consider `pymupdf` (fitz) or `unstructured`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|