AITextDetector / README.md
ChauHPham's picture
Upload folder using huggingface_hub
84d4bbc verified

A newer version of the Gradio SDK is available: 6.3.0

Upgrade
metadata
title: AITextDetector
app_file: gradio_app.py
sdk: gradio
sdk_version: 5.49.1

AI Text Detector

A learning project for detecting AI-generated vs. human-written text with a modular Python package, YAML configs, GPU auto-detection, CLI, and a Gradio web interface.

🌐 Web Interface (Gradio)

Try it now on Google Colab (works perfectly on Mac M2!):

!pip install -q transformers torch pandas gradio kagglehub
!git clone https://github.com/ChauHPham/AITextDetector.git
%cd AITextDetector
!python gradio_app.py

Get a public shareable link instantly! See DEPLOY.md for deployment options.

🍎 Mac M2 Users

Google Colab is recommended - local training may fail due to PyTorch MPS mutex lock issues. The Gradio app works great in Colab with free GPU!

Quickstart (CLI)

# 1) Create & activate a virtualenv (recommended)
python -m venv .venv && source .venv/bin/activate

# 2) Install
pip install -r requirements.txt
pip install -e .

# 3) (Optional) Download Kaggle datasets into data/
python scripts/kaggle_downloader.py

# 4) Configure
cp configs/default.yaml configs/local.yaml
# edit local.yaml if desired (change data_path, hyperparams, etc.)

# 5) Train
ai-detector train --data data/dataset.csv --config configs/local.yaml

# 6) Evaluate
ai-detector eval --model-path models/ai_detector --data data/dataset.csv --config configs/local.yaml

Datasets

  • LLM Detect AI Generated Text Dataset (Kaggle)
  • AI vs Human Text (Kaggle)

Use scripts/kaggle_downloader.py to fetch them. You may need to normalize/merge columns; the loader tries common names (text, content, essay and label, class, target).

Config

See configs/default.yaml. Key fields:

  • base_model: e.g., roberta-base
  • max_length, batch_size, num_epochs, lr
  • fp16: set null to auto-enable on CUDA

Notes

  • Labels standardized to 0=human, 1=ai.
  • Mixed precision (fp16) auto-enables on CUDA.
  • Evaluate with accuracy, macro-F1, and confusion matrix.
  • Mac M2 users: Use Google Colab for training (see above) to avoid PyTorch MPS bugs.

Deployment

See DEPLOY.md for:

  • Google Colab setup (recommended for Mac M2)
  • Hugging Face Spaces deployment (gradio deploy)
  • Docker deployment
  • Troubleshooting guide