---
language:
  - en
pipeline_tag: text-classification
library_name: peft
base_model: microsoft/deberta-v3-large
datasets:
  - stealthcode/ai-detection
tags:
  - lora
  - ai-detection
  - binary-classification
  - deberta-v3-large
metrics:
  - accuracy
  - f1
  - auroc
  - average_precision
---

# AI Detector LoRA (DeBERTa-v3-large)

LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.3M English samples
(`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model.

- **Base model:** `microsoft/deberta-v3-large`
- **Task:** Binary classification (AI vs Human)
- **Head:** Single-logit + `BCEWithLogitsLoss`
- **Adapter type:** LoRA (`peft`)
- **Hardware:** H100 SXM, bf16, multi-GPU
- **Final decision threshold:** **0.9284** (max-F1 on calibration set)

---

## Files in this repo

- `adapter/` – LoRA weights saved with `peft_model.save_pretrained(...)`
- `merged_model/` – fully merged model (base + LoRA) for standalone use
- `threshold.json` – chosen deployment threshold and validation F1
- `calibration.json` – temperature scaling parameters and calibration metrics
- `results.json` – hyperparameters, validation threshold search, test metrics
- `training_log_history.csv` – raw Trainer log history
- `predictions_calib.csv` – calibration-set probabilities and labels
- `predictions_test.csv` – test probabilities and labels
- `figures/` – training and evaluation plots
- `README.md` – this file

---

## Metrics (test set)

Using threshold **0.9284**:

| Metric                 | Value  |
| ---------------------- | ------ |
| AUROC                  | 0.9979 |
| Average Precision (AP) | 0.9977 |
| F1                     | 0.9773 |
| Accuracy               | 0.9797 |
| Precision              | 0.9909 |
| Recall                 | 0.9640 |
| Specificity            | 0.9927 |

Confusion matrix (test):

- **True Negatives (Human correctly)**: 123,936
- **False Positives (Human → AI)**: 912
- **False Negatives (AI → Human)**: 3,723
- **True Positives (AI correctly)**: 99,816

### Calibration

- **Method:** temperature scaling
- **Temperature (T):** 1.2807
- **Calibration set:** calibration
- Test ECE: 0.0119 → 0.0159 (after calibration)
- Test Brier: 0.01812 → 0.01829 (after calibration)

---

## Plots

### Training & validation

- Learning curves:

  ![Learning curves](./figures/fig_learning_curves.png)

- Eval metrics over time:

  ![Eval metrics](./figures/fig_eval_metrics.png)

### Validation set

- ROC:

  ![ROC (calib)](./figures/fig_roc_calib.png)

- Precision–Recall:

  ![PR (calib)](./figures/fig_pr_calib.png)

- Calibration curve:

  ![Calibration (calib)](./figures/fig_calibration_calib.png)

- F1 vs threshold:

  ![F1 vs threshold (calib)](./figures/fig_threshold_f1_calib.png)

### Test set

- ROC:

  ![ROC (test)](./figures/fig_roc_test.png)

- Precision–Recall:

  ![PR (test)](./figures/fig_pr_test.png)

- Calibration curve:

  ![Calibration (test)](./figures/fig_calibration_test.png)

- Confusion matrix:

  ![Confusion matrix (test)](./figures/fig_confusion_test.png)

---

## Usage

### Load base + LoRA adapter

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch
import json

base_model_id = "microsoft/deberta-v3-large"
adapter_id    = "stealthcode/ai-detection"  # or local: "./adapter"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)

base_model = AutoModelForSequenceClassification.from_pretrained(
    base_model_id,
    num_labels=1,  # single logit for BCEWithLogitsLoss
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()
```

### Inference with threshold

```python
# load threshold
with open("threshold.json") as f:
    thr = json.load(f)["threshold"]  # 0.9284

def predict_proba(texts):
    enc = tokenizer(
        texts,
        padding=True,
        truncation=True,
        max_length=512,
        return_tensors="pt",
    )
    with torch.no_grad():
        logits = model(**enc).logits.squeeze(-1)
        probs = torch.sigmoid(logits)
    return probs.cpu().numpy()

def predict_label(texts, threshold=thr):
    probs = predict_proba(texts)
    return (probs >= threshold).astype(int)

# example
texts = ["Some example text to classify"]
probs = predict_proba(texts)
labels = predict_label(texts)
print(probs, labels)  # label 1 = AI, 0 = Human
```

### Load merged model (no PEFT required)

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch, json

model_dir = "./merged_model"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSequenceClassification.from_pretrained(model_dir)
model.eval()

with open("threshold.json") as f:
    thr = json.load(f)["threshold"]  # 0.9284

def predict_proba(texts):
    enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
    with torch.no_grad():
        logits = model(**enc).logits.squeeze(-1)
        probs = torch.sigmoid(logits)
    return probs.cpu().numpy()
```

### Optional: apply temperature scaling to logits

```python
import json
with open("calibration.json") as f:
    T = json.load(f)["temperature"]  # e.g., 1.2807

def predict_proba_calibrated(texts):
    enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
    with torch.no_grad():
        logits = model(**enc).logits.squeeze(-1)
        probs = torch.sigmoid(logits / T)
    return probs.cpu().numpy()
```

---

## Notes

- Classifier head is **trainable** together with LoRA layers (unfrozen after applying PEFT).
- Training used:

  - `bf16=True`
  - `optim="adamw_torch_fused"`
  - cosine-with-restarts scheduler
  - LR scaled down from HPO to account for full-dataset (~14k steps).

- Threshold `0.9284` was chosen as the **max-F1** point on the calibration set.
  You can adjust it if you prefer fewer false positives or fewer false negatives.