Spaces:

tomerz14
/

BERT_Text_Source_Classifier

Sleeping

App Files Files Community

BERT_Text_Source_Classifier / README.md

tomerz14

Update README.md

196125f verified 4 months ago

preview code

raw

history blame contribute delete

2.79 kB

	---
	title: AI vs Human Document Classifier
	emoji: 📄
	colorFrom: indigo
	colorTo: blue
	sdk: gradio
	sdk_version: 5.49.0
	app_file: app.py
	pinned: false
	license: mit
	---

	# 🔎 AI vs Human — Document Classifier

	This Gradio Space lets you upload a document (TXT, MD, HTML, or PDF) and predicts whether it was AI-generated or Human-written.

	The app supports long documents by splitting them into overlapping 512‑token chunks and aggregating predictions to provide an overall document‑level probability.

	---

	## ✨ Features

	✅ Interactive Interface
	- Upload documents directly (TXT, MD, HTML, PDF)
	- Displays clean probability bars for AI‑generated vs Human‑written
	- Shows a confidence badge (“Likely AI” / “Likely Human”) with traffic‑light colors
	- Separate Basic and Advanced tabs for simplicity
	- A Chunk Details accordion with per‑chunk probabilities for deeper inspection

	✅ Configurable Parameters
	- Adjust `MAX_LENGTH` and `STRIDE` for token chunking
	- Choose aggregation method (`mean` or `max`) across chunks

	✅ Fully local
	- No Hub API calls beyond model loading
	- Runs on CPU, GPU, or MPS automatically

	---

	## ⚙️ Environment Variables

	You can configure your Space in Settings → Variables:

	\| Variable \| Description \| Default \|
	\|-----------\|--------------\|----------\|
	\| `MODEL_ID` \| Hugging Face repo ID of your model \| `bert-base-uncased` \|
	\| `MAX_LENGTH` \| Tokens per chunk \| `512` \|
	\| `STRIDE` \| Overlap tokens between chunks \| `128` \|

	Example:
	```
	MODEL_ID=your-username/bert-binclass
	MAX_LENGTH=512
	STRIDE=128
	```

	---

	## 🧠 Example Workflow

	1. Train your binary classifier using `train.py` and push to Hub.
	2. Deploy this Space with your model:
	- Set the Space variable `MODEL_ID` to your repo.
	3. Upload any text file — the app will:
	- Chunk the text
	- Run inference on each chunk
	- Show probabilities like:

	```
	AI generated: 0.82
	Human written: 0.18
	```

	and a color‑coded confidence badge.

	---

	## 🚀 Run Locally

	```bash
	pip install -r requirements.txt
	python app.py
	```

	Then open the Gradio URL shown in your terminal.

	---

	## 🖼️ UI Preview

	> ![screenshot placeholder](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/gradio-placeholder.png)
	>
	> Top: prediction and probabilities; bottom: per‑chunk details.

	---

	## 🧩 Notes

	- PDF parsing uses [`pypdf`](https://pypi.org/project/pypdf/); for better results or OCR, consider [`pymupdf`](https://pypi.org/project/PyMuPDF/) or [`unstructured`](https://github.com/Unstructured-IO/unstructured).
	- The color scheme is based on the Soft Indigo theme for a calm, modern feel.

	---

	## 🪪 License

	MIT — feel free to modify and re‑deploy.