tomerz14's picture
Update README.md
196125f verified
---
title: AI vs Human Document Classifier
emoji: 📄
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
license: mit
---
# 🔎 AI vs Human — Document Classifier
This **Gradio Space** lets you upload a document (TXT, MD, HTML, or PDF) and predicts whether it was **AI-generated** or **Human-written**.
The app supports **long documents** by splitting them into overlapping 512‑token chunks and aggregating predictions to provide an overall document‑level probability.
---
## ✨ Features
**Interactive Interface**
- Upload documents directly (TXT, MD, HTML, PDF)
- Displays clean probability bars for *AI‑generated* vs *Human‑written*
- Shows a **confidence badge** (“Likely AI” / “Likely Human”) with traffic‑light colors
- Separate **Basic** and **Advanced** tabs for simplicity
- A **Chunk Details** accordion with per‑chunk probabilities for deeper inspection
**Configurable Parameters**
- Adjust `MAX_LENGTH` and `STRIDE` for token chunking
- Choose aggregation method (`mean` or `max`) across chunks
**Fully local**
- No Hub API calls beyond model loading
- Runs on CPU, GPU, or MPS automatically
---
## ⚙️ Environment Variables
You can configure your Space in **Settings → Variables**:
| Variable | Description | Default |
|-----------|--------------|----------|
| `MODEL_ID` | Hugging Face repo ID of your model | `bert-base-uncased` |
| `MAX_LENGTH` | Tokens per chunk | `512` |
| `STRIDE` | Overlap tokens between chunks | `128` |
Example:
```
MODEL_ID=your-username/bert-binclass
MAX_LENGTH=512
STRIDE=128
```
---
## 🧠 Example Workflow
1. Train your binary classifier using `train.py` and push to Hub.
2. Deploy this Space with your model:
- Set the Space variable `MODEL_ID` to your repo.
3. Upload any text file — the app will:
- Chunk the text
- Run inference on each chunk
- Show probabilities like:
```
AI generated: 0.82
Human written: 0.18
```
and a color‑coded **confidence badge**.
---
## 🚀 Run Locally
```bash
pip install -r requirements.txt
python app.py
```
Then open the Gradio URL shown in your terminal.
---
## 🖼️ UI Preview
> ![screenshot placeholder](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/gradio-placeholder.png)
>
> *Top: prediction and probabilities; bottom: per‑chunk details.*
---
## 🧩 Notes
- PDF parsing uses [`pypdf`](https://pypi.org/project/pypdf/); for better results or OCR, consider [`pymupdf`](https://pypi.org/project/PyMuPDF/) or [`unstructured`](https://github.com/Unstructured-IO/unstructured).
- The color scheme is based on the **Soft Indigo** theme for a calm, modern feel.
---
## 🪪 License
MIT — feel free to modify and re‑deploy.