--- title: AI vs Human Document Classifier emoji: 📄 colorFrom: indigo colorTo: blue sdk: gradio sdk_version: 5.49.0 app_file: app.py pinned: false license: mit --- # 🔎 AI vs Human — Document Classifier This **Gradio Space** lets you upload a document (TXT, MD, HTML, or PDF) and predicts whether it was **AI-generated** or **Human-written**. The app supports **long documents** by splitting them into overlapping 512‑token chunks and aggregating predictions to provide an overall document‑level probability. --- ## ✨ Features ✅ **Interactive Interface** - Upload documents directly (TXT, MD, HTML, PDF) - Displays clean probability bars for *AI‑generated* vs *Human‑written* - Shows a **confidence badge** (“Likely AI” / “Likely Human”) with traffic‑light colors - Separate **Basic** and **Advanced** tabs for simplicity - A **Chunk Details** accordion with per‑chunk probabilities for deeper inspection ✅ **Configurable Parameters** - Adjust `MAX_LENGTH` and `STRIDE` for token chunking - Choose aggregation method (`mean` or `max`) across chunks ✅ **Fully local** - No Hub API calls beyond model loading - Runs on CPU, GPU, or MPS automatically --- ## ⚙️ Environment Variables You can configure your Space in **Settings → Variables**: | Variable | Description | Default | |-----------|--------------|----------| | `MODEL_ID` | Hugging Face repo ID of your model | `bert-base-uncased` | | `MAX_LENGTH` | Tokens per chunk | `512` | | `STRIDE` | Overlap tokens between chunks | `128` | Example: ``` MODEL_ID=your-username/bert-binclass MAX_LENGTH=512 STRIDE=128 ``` --- ## 🧠 Example Workflow 1. Train your binary classifier using `train.py` and push to Hub. 2. Deploy this Space with your model: - Set the Space variable `MODEL_ID` to your repo. 3. Upload any text file — the app will: - Chunk the text - Run inference on each chunk - Show probabilities like: ``` AI generated: 0.82 Human written: 0.18 ``` and a color‑coded **confidence badge**. --- ## 🚀 Run Locally ```bash pip install -r requirements.txt python app.py ``` Then open the Gradio URL shown in your terminal. --- ## 🖼️ UI Preview > ![screenshot placeholder](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/gradio-placeholder.png) > > *Top: prediction and probabilities; bottom: per‑chunk details.* --- ## 🧩 Notes - PDF parsing uses [`pypdf`](https://pypi.org/project/pypdf/); for better results or OCR, consider [`pymupdf`](https://pypi.org/project/PyMuPDF/) or [`unstructured`](https://github.com/Unstructured-IO/unstructured). - The color scheme is based on the **Soft Indigo** theme for a calm, modern feel. --- ## 🪪 License MIT — feel free to modify and re‑deploy.