|
|
--- |
|
|
title: AI vs Human Document Classifier |
|
|
emoji: 📄 |
|
|
colorFrom: indigo |
|
|
colorTo: blue |
|
|
sdk: gradio |
|
|
sdk_version: 5.49.0 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# 🔎 AI vs Human — Document Classifier |
|
|
|
|
|
This **Gradio Space** lets you upload a document (TXT, MD, HTML, or PDF) and predicts whether it was **AI-generated** or **Human-written**. |
|
|
|
|
|
The app supports **long documents** by splitting them into overlapping 512‑token chunks and aggregating predictions to provide an overall document‑level probability. |
|
|
|
|
|
--- |
|
|
|
|
|
## ✨ Features |
|
|
|
|
|
✅ **Interactive Interface** |
|
|
- Upload documents directly (TXT, MD, HTML, PDF) |
|
|
- Displays clean probability bars for *AI‑generated* vs *Human‑written* |
|
|
- Shows a **confidence badge** (“Likely AI” / “Likely Human”) with traffic‑light colors |
|
|
- Separate **Basic** and **Advanced** tabs for simplicity |
|
|
- A **Chunk Details** accordion with per‑chunk probabilities for deeper inspection |
|
|
|
|
|
✅ **Configurable Parameters** |
|
|
- Adjust `MAX_LENGTH` and `STRIDE` for token chunking |
|
|
- Choose aggregation method (`mean` or `max`) across chunks |
|
|
|
|
|
✅ **Fully local** |
|
|
- No Hub API calls beyond model loading |
|
|
- Runs on CPU, GPU, or MPS automatically |
|
|
|
|
|
--- |
|
|
|
|
|
## ⚙️ Environment Variables |
|
|
|
|
|
You can configure your Space in **Settings → Variables**: |
|
|
|
|
|
| Variable | Description | Default | |
|
|
|-----------|--------------|----------| |
|
|
| `MODEL_ID` | Hugging Face repo ID of your model | `bert-base-uncased` | |
|
|
| `MAX_LENGTH` | Tokens per chunk | `512` | |
|
|
| `STRIDE` | Overlap tokens between chunks | `128` | |
|
|
|
|
|
Example: |
|
|
``` |
|
|
MODEL_ID=your-username/bert-binclass |
|
|
MAX_LENGTH=512 |
|
|
STRIDE=128 |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧠 Example Workflow |
|
|
|
|
|
1. Train your binary classifier using `train.py` and push to Hub. |
|
|
2. Deploy this Space with your model: |
|
|
- Set the Space variable `MODEL_ID` to your repo. |
|
|
3. Upload any text file — the app will: |
|
|
- Chunk the text |
|
|
- Run inference on each chunk |
|
|
- Show probabilities like: |
|
|
|
|
|
``` |
|
|
AI generated: 0.82 |
|
|
Human written: 0.18 |
|
|
``` |
|
|
|
|
|
and a color‑coded **confidence badge**. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚀 Run Locally |
|
|
|
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
python app.py |
|
|
``` |
|
|
|
|
|
Then open the Gradio URL shown in your terminal. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🖼️ UI Preview |
|
|
|
|
|
>  |
|
|
> |
|
|
> *Top: prediction and probabilities; bottom: per‑chunk details.* |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧩 Notes |
|
|
|
|
|
- PDF parsing uses [`pypdf`](https://pypi.org/project/pypdf/); for better results or OCR, consider [`pymupdf`](https://pypi.org/project/PyMuPDF/) or [`unstructured`](https://github.com/Unstructured-IO/unstructured). |
|
|
- The color scheme is based on the **Soft Indigo** theme for a calm, modern feel. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🪪 License |
|
|
|
|
|
MIT — feel free to modify and re‑deploy. |