Spaces:

tomerz14
/

BERT_Text_Source_Classifier

Sleeping

App Files Files Community

BERT_Text_Source_Classifier / README.md

tomerz14

Update README.md

196125f verified 4 months ago

preview code

raw

history blame contribute delete

2.79 kB

A newer version of the Gradio SDK is available: 6.3.0

Upgrade

metadata

title: AI vs Human Document Classifier
emoji: 📄
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
license: mit

🔎 AI vs Human — Document Classifier

This Gradio Space lets you upload a document (TXT, MD, HTML, or PDF) and predicts whether it was AI-generated or Human-written.

The app supports long documents by splitting them into overlapping 512‑token chunks and aggregating predictions to provide an overall document‑level probability.

✨ Features

✅ Interactive Interface

Upload documents directly (TXT, MD, HTML, PDF)
Displays clean probability bars for AI‑generated vs Human‑written
Shows a confidence badge (“Likely AI” / “Likely Human”) with traffic‑light colors
Separate Basic and Advanced tabs for simplicity
A Chunk Details accordion with per‑chunk probabilities for deeper inspection

✅ Configurable Parameters

Adjust MAX_LENGTH and STRIDE for token chunking
Choose aggregation method (mean or max) across chunks

✅ Fully local

No Hub API calls beyond model loading
Runs on CPU, GPU, or MPS automatically

⚙️ Environment Variables

You can configure your Space in Settings → Variables:

Variable	Description	Default
`MODEL_ID`	Hugging Face repo ID of your model	`bert-base-uncased`
`MAX_LENGTH`	Tokens per chunk	`512`
`STRIDE`	Overlap tokens between chunks	`128`

Example:

MODEL_ID=your-username/bert-binclass
MAX_LENGTH=512
STRIDE=128

🧠 Example Workflow

Train your binary classifier using train.py and push to Hub.
Deploy this Space with your model:
- Set the Space variable MODEL_ID to your repo.
Upload any text file — the app will:
- Chunk the text
- Run inference on each chunk
- Show probabilities like:

AI generated: 0.82
Human written: 0.18

and a color‑coded confidence badge.

🚀 Run Locally

pip install -r requirements.txt
python app.py

Then open the Gradio URL shown in your terminal.

🖼️ UI Preview

Top: prediction and probabilities; bottom: per‑chunk details.

🧩 Notes

PDF parsing uses pypdf; for better results or OCR, consider pymupdf or unstructured.
The color scheme is based on the Soft Indigo theme for a calm, modern feel.

🪪 License

MIT — feel free to modify and re‑deploy.