Spaces:

tomerz14
/

BERT_Text_Source_Classifier

Sleeping

Create README.md

4c97017 verified 4 months ago

872 Bytes

	---
	title: Binary Doc Classifier (Chunked)
	emoji: 📄
	colorFrom: indigo
	colorTo: purple
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	pinned: false
	license: mit
	---

	# Binary Document Classifier — Gradio Space

	This Space hosts a Gradio app for binary text classification on uploaded documents.
	It supports long documents by chunking (512-token windows with overlap) and aggregates
	chunk probabilities into a document-level prediction.

	## Configuration

	Set the following Space variables in the UI (Settings → Variables):

	- `MODEL_ID` — your trained model repo (e.g., `your-username/bert-binclass`)
	- `MAX_LENGTH` — tokens per chunk (default: `512`)
	- `STRIDE` — overlap tokens between chunks (default: `128`)

	## Local run

	```bash
	pip install -r requirements.txt
	python app.py
	```

	## Notes

	- PDF extraction uses `pypdf` for simplicity.