tomerz14's picture
Create README.md
4c97017 verified
|
raw
history blame
872 Bytes
metadata
title: Binary Doc Classifier (Chunked)
emoji: πŸ“„
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit

Binary Document Classifier β€” Gradio Space

This Space hosts a Gradio app for binary text classification on uploaded documents. It supports long documents by chunking (512-token windows with overlap) and aggregates chunk probabilities into a document-level prediction.

Configuration

Set the following Space variables in the UI (Settings β†’ Variables):

  • MODEL_ID β€” your trained model repo (e.g., your-username/bert-binclass)
  • MAX_LENGTH β€” tokens per chunk (default: 512)
  • STRIDE β€” overlap tokens between chunks (default: 128)

Local run

pip install -r requirements.txt
python app.py

Notes

  • PDF extraction uses pypdf for simplicity.