| # AI Detector LoRA (DeBERTa-v3-large) | |
| LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.3M English samples | |
| (`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model. | |
| - **Base model:** `microsoft/deberta-v3-large` | |
| - **Task:** Binary classification (AI vs Human) | |
| - **Head:** Single-logit + `BCEWithLogitsLoss` | |
| - **Adapter type:** LoRA (`peft`) | |
| - **Hardware:** H100 SXM, bf16, multi-GPU | |
| - **Final decision threshold:** **0.9033** (max-F1 on validation) | |
| --- | |
| ## Files in this repo | |
| - `adapter/` – LoRA weights saved with `peft_model.save_pretrained(...)` | |
| - `threshold.json` – chosen deployment threshold and validation F1 | |
| - `results.json` – hyperparameters, validation threshold search, test metrics | |
| - `training_log_history.csv` – raw Trainer log history | |
| - `predictions_val.csv` – validation probabilities and labels | |
| - `predictions_test.csv` – test probabilities and labels | |
| - `figures/` – training and evaluation plots | |
| - `README.md` – this file | |
| --- | |
| ## Metrics (test set) | |
| Using threshold **0.9033**: | |
| | Metric | Value | | |
| |--------------|---------| | |
| | AUROC | 0.9970 | | |
| | Average Precision (AP) | 0.9966 | | |
| | F1 | 0.9740 | | |
| | Accuracy | 0.9767 | | |
| | Precision | 0.9857 | | |
| | Recall | 0.9625 | | |
| | Specificity | 0.9884 | | |
| Confusion matrix (test): | |
| - **True Negatives (Human correctly)**: 123,399 | |
| - **False Positives (Human → AI)**: 1,449 | |
| - **False Negatives (AI → Human)**: 3,882 | |
| - **True Positives (AI correctly)**: 99,657 | |
| --- | |
| ## Plots | |
| ### Training & validation | |
| - Learning curves: | |
|  | |
| - Eval metrics over time: | |
|  | |
| ### Validation set | |
| - ROC: | |
|  | |
| - Precision–Recall: | |
|  | |
| - Calibration curve: | |
|  | |
| - F1 vs threshold: | |
|  | |
| ### Test set | |
| - ROC: | |
|  | |
| - Precision–Recall: | |
|  | |
| - Calibration curve: | |
|  | |
| - Confusion matrix: | |
|  | |
| --- | |
| ## Usage | |
| ### Load base + LoRA adapter | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| from peft import PeftModel | |
| import torch | |
| import json | |
| base_model_id = "microsoft/deberta-v3-large" | |
| adapter_id = "<your-username>/<your-private-repo>" | |
| tokenizer = AutoTokenizer.from_pretrained(base_model_id) | |
| base_model = AutoModelForSequenceClassification.from_pretrained( | |
| base_model_id, | |
| num_labels=1, # single logit for BCEWithLogitsLoss | |
| ) | |
| model = PeftModel.from_pretrained(base_model, adapter_id) | |
| model.eval() | |
| ```` | |
| ### Inference with threshold | |
| ```python | |
| # load threshold | |
| with open("threshold.json") as f: | |
| thr = json.load(f)["threshold"] # 0.9033 | |
| def predict_proba(texts): | |
| enc = tokenizer( | |
| texts, | |
| padding=True, | |
| truncation=True, | |
| max_length=512, | |
| return_tensors="pt", | |
| ) | |
| with torch.no_grad(): | |
| logits = model(**enc).logits.squeeze(-1) | |
| probs = torch.sigmoid(logits) | |
| return probs.cpu().numpy() | |
| def predict_label(texts, threshold=thr): | |
| probs = predict_proba(texts) | |
| return (probs >= threshold).astype(int) | |
| # example | |
| texts = ["Some example text to classify"] | |
| probs = predict_proba(texts) | |
| labels = predict_label(texts) | |
| print(probs, labels) # label 1 = AI, 0 = Human | |
| ``` | |
| --- | |
| ## Notes | |
| * Classifier head is **trainable** together with LoRA layers (unfrozen after applying PEFT). | |
| * Training used: | |
| * `bf16=True` | |
| * `optim="adamw_torch_fused"` | |
| * cosine-with-restarts scheduler | |
| * LR scaled down from HPO to account for full-dataset (~14k steps). | |
| * Threshold `0.9033` was chosen as the **max-F1** point on the validation set. | |
| You can adjust it if you prefer fewer false positives or fewer false negatives. | |