|
|
--- |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-classification |
|
|
library_name: peft |
|
|
base_model: microsoft/deberta-v3-large |
|
|
datasets: |
|
|
- stealthcode/ai-detection |
|
|
tags: |
|
|
- lora |
|
|
- ai-detection |
|
|
- binary-classification |
|
|
- deberta-v3-large |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- auroc |
|
|
- average_precision |
|
|
--- |
|
|
|
|
|
# AI Detector LoRA (DeBERTa-v3-large) |
|
|
|
|
|
LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.3M English samples |
|
|
(`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model. |
|
|
|
|
|
- **Base model:** `microsoft/deberta-v3-large` |
|
|
- **Task:** Binary classification (AI vs Human) |
|
|
- **Head:** Single-logit + `BCEWithLogitsLoss` |
|
|
- **Adapter type:** LoRA (`peft`) |
|
|
- **Hardware:** H100 SXM, bf16, multi-GPU |
|
|
- **Final decision threshold:** **0.9033** (max-F1 on validation) |
|
|
|
|
|
--- |
|
|
|
|
|
## Files in this repo |
|
|
|
|
|
- `adapter/` – LoRA weights saved with `peft_model.save_pretrained(...)` |
|
|
- `threshold.json` – chosen deployment threshold and validation F1 |
|
|
- `results.json` – hyperparameters, validation threshold search, test metrics |
|
|
- `training_log_history.csv` – raw Trainer log history |
|
|
- `predictions_val.csv` – validation probabilities and labels |
|
|
- `predictions_test.csv` – test probabilities and labels |
|
|
- `figures/` – training and evaluation plots |
|
|
- `README.md` – this file |
|
|
|
|
|
--- |
|
|
|
|
|
## Metrics (test set) |
|
|
|
|
|
Using threshold **0.9033**: |
|
|
|
|
|
| Metric | Value | |
|
|
|--------------|---------| |
|
|
| AUROC | 0.9970 | |
|
|
| Average Precision (AP) | 0.9966 | |
|
|
| F1 | 0.9740 | |
|
|
| Accuracy | 0.9767 | |
|
|
| Precision | 0.9857 | |
|
|
| Recall | 0.9625 | |
|
|
| Specificity | 0.9884 | |
|
|
|
|
|
Confusion matrix (test): |
|
|
|
|
|
- **True Negatives (Human correctly)**: 123,399 |
|
|
- **False Positives (Human → AI)**: 1,449 |
|
|
- **False Negatives (AI → Human)**: 3,882 |
|
|
- **True Positives (AI correctly)**: 99,657 |
|
|
|
|
|
--- |
|
|
|
|
|
## Plots |
|
|
|
|
|
### Training & validation |
|
|
|
|
|
- Learning curves: |
|
|
|
|
|
 |
|
|
|
|
|
- Eval metrics over time: |
|
|
|
|
|
 |
|
|
|
|
|
### Validation set |
|
|
|
|
|
- ROC: |
|
|
|
|
|
 |
|
|
|
|
|
- Precision–Recall: |
|
|
|
|
|
 |
|
|
|
|
|
- Calibration curve: |
|
|
|
|
|
 |
|
|
|
|
|
- F1 vs threshold: |
|
|
|
|
|
 |
|
|
|
|
|
### Test set |
|
|
|
|
|
- ROC: |
|
|
|
|
|
 |
|
|
|
|
|
- Precision–Recall: |
|
|
|
|
|
 |
|
|
|
|
|
- Calibration curve: |
|
|
|
|
|
 |
|
|
|
|
|
- Confusion matrix: |
|
|
|
|
|
 |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Load base + LoRA adapter |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
from peft import PeftModel |
|
|
import torch |
|
|
import json |
|
|
|
|
|
base_model_id = "microsoft/deberta-v3-large" |
|
|
adapter_id = "stealthcode/ai-detection" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(base_model_id) |
|
|
|
|
|
base_model = AutoModelForSequenceClassification.from_pretrained( |
|
|
base_model_id, |
|
|
num_labels=1, # single logit for BCEWithLogitsLoss |
|
|
) |
|
|
model = PeftModel.from_pretrained(base_model, adapter_id) |
|
|
model.eval() |
|
|
```` |
|
|
|
|
|
### Inference with threshold |
|
|
|
|
|
```python |
|
|
# load threshold |
|
|
with open("threshold.json") as f: |
|
|
thr = json.load(f)["threshold"] # 0.9033 |
|
|
|
|
|
def predict_proba(texts): |
|
|
enc = tokenizer( |
|
|
texts, |
|
|
padding=True, |
|
|
truncation=True, |
|
|
max_length=512, |
|
|
return_tensors="pt", |
|
|
) |
|
|
with torch.no_grad(): |
|
|
logits = model(**enc).logits.squeeze(-1) |
|
|
probs = torch.sigmoid(logits) |
|
|
return probs.cpu().numpy() |
|
|
|
|
|
def predict_label(texts, threshold=thr): |
|
|
probs = predict_proba(texts) |
|
|
return (probs >= threshold).astype(int) |
|
|
|
|
|
# example |
|
|
texts = ["Some example text to classify"] |
|
|
probs = predict_proba(texts) |
|
|
labels = predict_label(texts) |
|
|
print(probs, labels) # label 1 = AI, 0 = Human |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Notes |
|
|
|
|
|
* Classifier head is **trainable** together with LoRA layers (unfrozen after applying PEFT). |
|
|
* Training used: |
|
|
|
|
|
* `bf16=True` |
|
|
* `optim="adamw_torch_fused"` |
|
|
* cosine-with-restarts scheduler |
|
|
* LR scaled down from HPO to account for full-dataset (~14k steps). |
|
|
* Threshold `0.9033` was chosen as the **max-F1** point on the validation set. |
|
|
You can adjust it if you prefer fewer false positives or fewer false negatives. |
|
|
|