ai-detection / README.md
kbourro's picture
Upload README.md with huggingface_hub
76ecb56 verified
|
raw
history blame
4.35 kB
metadata
language:
  - en
pipeline_tag: text-classification
library_name: peft
base_model: microsoft/deberta-v3-large
datasets:
  - stealthcode/ai-detection
tags:
  - lora
  - ai-detection
  - binary-classification
  - deberta-v3-large
metrics:
  - accuracy
  - f1
  - auroc
  - average_precision

AI Detector LoRA (DeBERTa-v3-large)

LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.3M English samples (label: 1 = AI, 0 = Human) using microsoft/deberta-v3-large as the base model.

  • Base model: microsoft/deberta-v3-large
  • Task: Binary classification (AI vs Human)
  • Head: Single-logit + BCEWithLogitsLoss
  • Adapter type: LoRA (peft)
  • Hardware: H100 SXM, bf16, multi-GPU
  • Final decision threshold: 0.9033 (max-F1 on validation)

Files in this repo

  • adapter/ – LoRA weights saved with peft_model.save_pretrained(...)
  • threshold.json – chosen deployment threshold and validation F1
  • results.json – hyperparameters, validation threshold search, test metrics
  • training_log_history.csv – raw Trainer log history
  • predictions_val.csv – validation probabilities and labels
  • predictions_test.csv – test probabilities and labels
  • figures/ – training and evaluation plots
  • README.md – this file

Metrics (test set)

Using threshold 0.9033:

Metric Value
AUROC 0.9970
Average Precision (AP) 0.9966
F1 0.9740
Accuracy 0.9767
Precision 0.9857
Recall 0.9625
Specificity 0.9884

Confusion matrix (test):

  • True Negatives (Human correctly): 123,399
  • False Positives (Human → AI): 1,449
  • False Negatives (AI → Human): 3,882
  • True Positives (AI correctly): 99,657

Plots

Training & validation

  • Learning curves:

    Learning curves

  • Eval metrics over time:

    Eval metrics

Validation set

  • ROC:

    ROC (val)

  • Precision–Recall:

    PR (val)

  • Calibration curve:

    Calibration (val)

  • F1 vs threshold:

    F1 vs threshold (val)

Test set

  • ROC:

    ROC (test)

  • Precision–Recall:

    PR (test)

  • Calibration curve:

    Calibration (test)

  • Confusion matrix:

    Confusion matrix (test)


Usage

Load base + LoRA adapter

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch
import json

base_model_id = "microsoft/deberta-v3-large"
adapter_id    = "stealthcode/ai-detection"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)

base_model = AutoModelForSequenceClassification.from_pretrained(
    base_model_id,
    num_labels=1,  # single logit for BCEWithLogitsLoss
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

Inference with threshold

# load threshold
with open("threshold.json") as f:
    thr = json.load(f)["threshold"]  # 0.9033

def predict_proba(texts):
    enc = tokenizer(
        texts,
        padding=True,
        truncation=True,
        max_length=512,
        return_tensors="pt",
    )
    with torch.no_grad():
        logits = model(**enc).logits.squeeze(-1)
        probs = torch.sigmoid(logits)
    return probs.cpu().numpy()

def predict_label(texts, threshold=thr):
    probs = predict_proba(texts)
    return (probs >= threshold).astype(int)

# example
texts = ["Some example text to classify"]
probs = predict_proba(texts)
labels = predict_label(texts)
print(probs, labels)  # label 1 = AI, 0 = Human

Notes

  • Classifier head is trainable together with LoRA layers (unfrozen after applying PEFT).

  • Training used:

    • bf16=True
    • optim="adamw_torch_fused"
    • cosine-with-restarts scheduler
    • LR scaled down from HPO to account for full-dataset (~14k steps).
  • Threshold 0.9033 was chosen as the max-F1 point on the validation set. You can adjust it if you prefer fewer false positives or fewer false negatives.