File size: 2,789 Bytes
4c97017 e459f02 4c97017 e459f02 4c97017 196125f 4c97017 e459f02 4c97017 e459f02 4c97017 e459f02 4c97017 e459f02 4c97017 e459f02 4c97017 e459f02 4c97017 e459f02 4c97017 e459f02 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
---
title: AI vs Human Document Classifier
emoji: 📄
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
license: mit
---
# 🔎 AI vs Human — Document Classifier
This **Gradio Space** lets you upload a document (TXT, MD, HTML, or PDF) and predicts whether it was **AI-generated** or **Human-written**.
The app supports **long documents** by splitting them into overlapping 512‑token chunks and aggregating predictions to provide an overall document‑level probability.
---
## ✨ Features
✅ **Interactive Interface**
- Upload documents directly (TXT, MD, HTML, PDF)
- Displays clean probability bars for *AI‑generated* vs *Human‑written*
- Shows a **confidence badge** (“Likely AI” / “Likely Human”) with traffic‑light colors
- Separate **Basic** and **Advanced** tabs for simplicity
- A **Chunk Details** accordion with per‑chunk probabilities for deeper inspection
✅ **Configurable Parameters**
- Adjust `MAX_LENGTH` and `STRIDE` for token chunking
- Choose aggregation method (`mean` or `max`) across chunks
✅ **Fully local**
- No Hub API calls beyond model loading
- Runs on CPU, GPU, or MPS automatically
---
## ⚙️ Environment Variables
You can configure your Space in **Settings → Variables**:
| Variable | Description | Default |
|-----------|--------------|----------|
| `MODEL_ID` | Hugging Face repo ID of your model | `bert-base-uncased` |
| `MAX_LENGTH` | Tokens per chunk | `512` |
| `STRIDE` | Overlap tokens between chunks | `128` |
Example:
```
MODEL_ID=your-username/bert-binclass
MAX_LENGTH=512
STRIDE=128
```
---
## 🧠 Example Workflow
1. Train your binary classifier using `train.py` and push to Hub.
2. Deploy this Space with your model:
- Set the Space variable `MODEL_ID` to your repo.
3. Upload any text file — the app will:
- Chunk the text
- Run inference on each chunk
- Show probabilities like:
```
AI generated: 0.82
Human written: 0.18
```
and a color‑coded **confidence badge**.
---
## 🚀 Run Locally
```bash
pip install -r requirements.txt
python app.py
```
Then open the Gradio URL shown in your terminal.
---
## 🖼️ UI Preview
> 
>
> *Top: prediction and probabilities; bottom: per‑chunk details.*
---
## 🧩 Notes
- PDF parsing uses [`pypdf`](https://pypi.org/project/pypdf/); for better results or OCR, consider [`pymupdf`](https://pypi.org/project/PyMuPDF/) or [`unstructured`](https://github.com/Unstructured-IO/unstructured).
- The color scheme is based on the **Soft Indigo** theme for a calm, modern feel.
---
## 🪪 License
MIT — feel free to modify and re‑deploy. |