tomerz14's picture
Update README.md
196125f verified

A newer version of the Gradio SDK is available: 6.3.0

Upgrade
metadata
title: AI vs Human Document Classifier
emoji: 📄
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
license: mit

🔎 AI vs Human — Document Classifier

This Gradio Space lets you upload a document (TXT, MD, HTML, or PDF) and predicts whether it was AI-generated or Human-written.

The app supports long documents by splitting them into overlapping 512‑token chunks and aggregating predictions to provide an overall document‑level probability.


✨ Features

Interactive Interface

  • Upload documents directly (TXT, MD, HTML, PDF)
  • Displays clean probability bars for AI‑generated vs Human‑written
  • Shows a confidence badge (“Likely AI” / “Likely Human”) with traffic‑light colors
  • Separate Basic and Advanced tabs for simplicity
  • A Chunk Details accordion with per‑chunk probabilities for deeper inspection

Configurable Parameters

  • Adjust MAX_LENGTH and STRIDE for token chunking
  • Choose aggregation method (mean or max) across chunks

Fully local

  • No Hub API calls beyond model loading
  • Runs on CPU, GPU, or MPS automatically

⚙️ Environment Variables

You can configure your Space in Settings → Variables:

Variable Description Default
MODEL_ID Hugging Face repo ID of your model bert-base-uncased
MAX_LENGTH Tokens per chunk 512
STRIDE Overlap tokens between chunks 128

Example:

MODEL_ID=your-username/bert-binclass
MAX_LENGTH=512
STRIDE=128

🧠 Example Workflow

  1. Train your binary classifier using train.py and push to Hub.
  2. Deploy this Space with your model:
    • Set the Space variable MODEL_ID to your repo.
  3. Upload any text file — the app will:
    • Chunk the text
    • Run inference on each chunk
    • Show probabilities like:
AI generated: 0.82
Human written: 0.18

and a color‑coded confidence badge.


🚀 Run Locally

pip install -r requirements.txt
python app.py

Then open the Gradio URL shown in your terminal.


🖼️ UI Preview

screenshot placeholder

Top: prediction and probabilities; bottom: per‑chunk details.


🧩 Notes

  • PDF parsing uses pypdf; for better results or OCR, consider pymupdf or unstructured.
  • The color scheme is based on the Soft Indigo theme for a calm, modern feel.

🪪 License

MIT — feel free to modify and re‑deploy.