ai-detection / README.md

kbourro

Upload README.md with huggingface_hub

76ecb56 verified about 2 months ago

preview code

raw

history blame

4.35 kB

metadata

language:
  - en
pipeline_tag: text-classification
library_name: peft
base_model: microsoft/deberta-v3-large
datasets:
  - stealthcode/ai-detection
tags:
  - lora
  - ai-detection
  - binary-classification
  - deberta-v3-large
metrics:
  - accuracy
  - f1
  - auroc
  - average_precision

AI Detector LoRA (DeBERTa-v3-large)

LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.3M English samples (label: 1 = AI, 0 = Human) using microsoft/deberta-v3-large as the base model.

Base model: microsoft/deberta-v3-large
Task: Binary classification (AI vs Human)
Head: Single-logit + BCEWithLogitsLoss
Adapter type: LoRA (peft)
Hardware: H100 SXM, bf16, multi-GPU
Final decision threshold: 0.9033 (max-F1 on validation)

Files in this repo

adapter/ – LoRA weights saved with peft_model.save_pretrained(...)
threshold.json – chosen deployment threshold and validation F1
results.json – hyperparameters, validation threshold search, test metrics
training_log_history.csv – raw Trainer log history
predictions_val.csv – validation probabilities and labels
predictions_test.csv – test probabilities and labels
figures/ – training and evaluation plots
README.md – this file

Metrics (test set)

Using threshold 0.9033:

Metric	Value
AUROC	0.9970
Average Precision (AP)	0.9966
F1	0.9740
Accuracy	0.9767
Precision	0.9857
Recall	0.9625
Specificity	0.9884

Confusion matrix (test):

True Negatives (Human correctly): 123,399
False Positives (Human → AI): 1,449
False Negatives (AI → Human): 3,882
True Positives (AI correctly): 99,657

Plots

Training & validation

Learning curves:
Eval metrics over time:

Validation set

ROC:
Precision–Recall:
Calibration curve:
F1 vs threshold:

Test set

ROC:
Precision–Recall:
Calibration curve:
Confusion matrix:

Usage

Load base + LoRA adapter

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch
import json

base_model_id = "microsoft/deberta-v3-large"
adapter_id    = "stealthcode/ai-detection"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)

base_model = AutoModelForSequenceClassification.from_pretrained(
    base_model_id,
    num_labels=1,  # single logit for BCEWithLogitsLoss
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

Inference with threshold

# load threshold
with open("threshold.json") as f:
    thr = json.load(f)["threshold"]  # 0.9033

def predict_proba(texts):
    enc = tokenizer(
        texts,
        padding=True,
        truncation=True,
        max_length=512,
        return_tensors="pt",
    )
    with torch.no_grad():
        logits = model(**enc).logits.squeeze(-1)
        probs = torch.sigmoid(logits)
    return probs.cpu().numpy()

def predict_label(texts, threshold=thr):
    probs = predict_proba(texts)
    return (probs >= threshold).astype(int)

# example
texts = ["Some example text to classify"]
probs = predict_proba(texts)
labels = predict_label(texts)
print(probs, labels)  # label 1 = AI, 0 = Human

Notes

Classifier head is trainable together with LoRA layers (unfrozen after applying PEFT).
Training used:
- bf16=True
- optim="adamw_torch_fused"
- cosine-with-restarts scheduler
- LR scaled down from HPO to account for full-dataset (~14k steps).
Threshold 0.9033 was chosen as the max-F1 point on the validation set. You can adjust it if you prefer fewer false positives or fewer false negatives.