--- language: - en pipeline_tag: text-classification library_name: peft base_model: microsoft/deberta-v3-large datasets: - stealthcode/ai-detection tags: - lora - ai-detection - binary-classification - deberta-v3-large metrics: - accuracy - f1 - auroc - average_precision --- # AI Detector LoRA (DeBERTa-v3-large) LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.3M English samples (`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model. - **Base model:** `microsoft/deberta-v3-large` - **Task:** Binary classification (AI vs Human) - **Head:** Single-logit + `BCEWithLogitsLoss` - **Adapter type:** LoRA (`peft`) - **Hardware:** H100 SXM, bf16, multi-GPU - **Final decision threshold:** **0.9033** (max-F1 on validation) --- ## Files in this repo - `adapter/` – LoRA weights saved with `peft_model.save_pretrained(...)` - `threshold.json` – chosen deployment threshold and validation F1 - `results.json` – hyperparameters, validation threshold search, test metrics - `training_log_history.csv` – raw Trainer log history - `predictions_val.csv` – validation probabilities and labels - `predictions_test.csv` – test probabilities and labels - `figures/` – training and evaluation plots - `README.md` – this file --- ## Metrics (test set) Using threshold **0.9033**: | Metric | Value | |--------------|---------| | AUROC | 0.9970 | | Average Precision (AP) | 0.9966 | | F1 | 0.9740 | | Accuracy | 0.9767 | | Precision | 0.9857 | | Recall | 0.9625 | | Specificity | 0.9884 | Confusion matrix (test): - **True Negatives (Human correctly)**: 123,399 - **False Positives (Human → AI)**: 1,449 - **False Negatives (AI → Human)**: 3,882 - **True Positives (AI correctly)**: 99,657 --- ## Plots ### Training & validation - Learning curves: ![Learning curves](./figures/fig_learning_curves.png) - Eval metrics over time: ![Eval metrics](./figures/fig_eval_metrics.png) ### Validation set - ROC: ![ROC (val)](./figures/fig_roc_val.png) - Precision–Recall: ![PR (val)](./figures/fig_pr_val.png) - Calibration curve: ![Calibration (val)](./figures/fig_calibration_val.png) - F1 vs threshold: ![F1 vs threshold (val)](./figures/fig_threshold_f1_val.png) ### Test set - ROC: ![ROC (test)](./figures/fig_roc_test.png) - Precision–Recall: ![PR (test)](./figures/fig_pr_test.png) - Calibration curve: ![Calibration (test)](./figures/fig_calibration_test.png) - Confusion matrix: ![Confusion matrix (test)](./figures/fig_confusion_test.png) --- ## Usage ### Load base + LoRA adapter ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification from peft import PeftModel import torch import json base_model_id = "microsoft/deberta-v3-large" adapter_id = "stealthcode/ai-detection" tokenizer = AutoTokenizer.from_pretrained(base_model_id) base_model = AutoModelForSequenceClassification.from_pretrained( base_model_id, num_labels=1, # single logit for BCEWithLogitsLoss ) model = PeftModel.from_pretrained(base_model, adapter_id) model.eval() ```` ### Inference with threshold ```python # load threshold with open("threshold.json") as f: thr = json.load(f)["threshold"] # 0.9033 def predict_proba(texts): enc = tokenizer( texts, padding=True, truncation=True, max_length=512, return_tensors="pt", ) with torch.no_grad(): logits = model(**enc).logits.squeeze(-1) probs = torch.sigmoid(logits) return probs.cpu().numpy() def predict_label(texts, threshold=thr): probs = predict_proba(texts) return (probs >= threshold).astype(int) # example texts = ["Some example text to classify"] probs = predict_proba(texts) labels = predict_label(texts) print(probs, labels) # label 1 = AI, 0 = Human ``` --- ## Notes * Classifier head is **trainable** together with LoRA layers (unfrozen after applying PEFT). * Training used: * `bf16=True` * `optim="adamw_torch_fused"` * cosine-with-restarts scheduler * LR scaled down from HPO to account for full-dataset (~14k steps). * Threshold `0.9033` was chosen as the **max-F1** point on the validation set. You can adjust it if you prefer fewer false positives or fewer false negatives.