A newer version of the Gradio SDK is available:
6.3.0
metadata
title: AI vs Human Document Classifier
emoji: 📄
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
license: mit
🔎 AI vs Human — Document Classifier
This Gradio Space lets you upload a document (TXT, MD, HTML, or PDF) and predicts whether it was AI-generated or Human-written.
The app supports long documents by splitting them into overlapping 512‑token chunks and aggregating predictions to provide an overall document‑level probability.
✨ Features
✅ Interactive Interface
- Upload documents directly (TXT, MD, HTML, PDF)
- Displays clean probability bars for AI‑generated vs Human‑written
- Shows a confidence badge (“Likely AI” / “Likely Human”) with traffic‑light colors
- Separate Basic and Advanced tabs for simplicity
- A Chunk Details accordion with per‑chunk probabilities for deeper inspection
✅ Configurable Parameters
- Adjust
MAX_LENGTHandSTRIDEfor token chunking - Choose aggregation method (
meanormax) across chunks
✅ Fully local
- No Hub API calls beyond model loading
- Runs on CPU, GPU, or MPS automatically
⚙️ Environment Variables
You can configure your Space in Settings → Variables:
| Variable | Description | Default |
|---|---|---|
MODEL_ID |
Hugging Face repo ID of your model | bert-base-uncased |
MAX_LENGTH |
Tokens per chunk | 512 |
STRIDE |
Overlap tokens between chunks | 128 |
Example:
MODEL_ID=your-username/bert-binclass
MAX_LENGTH=512
STRIDE=128
🧠 Example Workflow
- Train your binary classifier using
train.pyand push to Hub. - Deploy this Space with your model:
- Set the Space variable
MODEL_IDto your repo.
- Set the Space variable
- Upload any text file — the app will:
- Chunk the text
- Run inference on each chunk
- Show probabilities like:
AI generated: 0.82
Human written: 0.18
and a color‑coded confidence badge.
🚀 Run Locally
pip install -r requirements.txt
python app.py
Then open the Gradio URL shown in your terminal.
🖼️ UI Preview
Top: prediction and probabilities; bottom: per‑chunk details.
🧩 Notes
- PDF parsing uses
pypdf; for better results or OCR, considerpymupdforunstructured. - The color scheme is based on the Soft Indigo theme for a calm, modern feel.
🪪 License
MIT — feel free to modify and re‑deploy.
