ai-detection / README.md

Upload folder using huggingface_hub

36cc3ec verified 2 months ago

4.06 kB

	# AI Detector LoRA (DeBERTa-v3-large)

	LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.3M English samples
	(`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model.

	- Base model: `microsoft/deberta-v3-large`
	- Task: Binary classification (AI vs Human)
	- Head: Single-logit + `BCEWithLogitsLoss`
	- Adapter type: LoRA (`peft`)
	- Hardware: H100 SXM, bf16, multi-GPU
	- Final decision threshold: 0.9033 (max-F1 on validation)

	---

	## Files in this repo

	- `adapter/` – LoRA weights saved with `peft_model.save_pretrained(...)`
	- `threshold.json` – chosen deployment threshold and validation F1
	- `results.json` – hyperparameters, validation threshold search, test metrics
	- `training_log_history.csv` – raw Trainer log history
	- `predictions_val.csv` – validation probabilities and labels
	- `predictions_test.csv` – test probabilities and labels
	- `figures/` – training and evaluation plots
	- `README.md` – this file

	---

	## Metrics (test set)

	Using threshold 0.9033:

	\| Metric \| Value \|
	\|--------------\|---------\|
	\| AUROC \| 0.9970 \|
	\| Average Precision (AP) \| 0.9966 \|
	\| F1 \| 0.9740 \|
	\| Accuracy \| 0.9767 \|
	\| Precision \| 0.9857 \|
	\| Recall \| 0.9625 \|
	\| Specificity \| 0.9884 \|

	Confusion matrix (test):

	- True Negatives (Human correctly): 123,399
	- False Positives (Human → AI): 1,449
	- False Negatives (AI → Human): 3,882
	- True Positives (AI correctly): 99,657

	---

	## Plots

	### Training & validation

	- Learning curves:

	![Learning curves](./figures/fig_learning_curves.png)

	- Eval metrics over time:

	![Eval metrics](./figures/fig_eval_metrics.png)

	### Validation set

	- ROC:

	![ROC (val)](./figures/fig_roc_val.png)

	- Precision–Recall:

	![PR (val)](./figures/fig_pr_val.png)

	- Calibration curve:

	![Calibration (val)](./figures/fig_calibration_val.png)

	- F1 vs threshold:

	![F1 vs threshold (val)](./figures/fig_threshold_f1_val.png)

	### Test set

	- ROC:

	![ROC (test)](./figures/fig_roc_test.png)

	- Precision–Recall:

	![PR (test)](./figures/fig_pr_test.png)

	- Calibration curve:

	![Calibration (test)](./figures/fig_calibration_test.png)

	- Confusion matrix:

	![Confusion matrix (test)](./figures/fig_confusion_test.png)

	---

	## Usage

	### Load base + LoRA adapter

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	from peft import PeftModel
	import torch
	import json

	base_model_id = "microsoft/deberta-v3-large"
	adapter_id = "<your-username>/<your-private-repo>"

	tokenizer = AutoTokenizer.from_pretrained(base_model_id)

	base_model = AutoModelForSequenceClassification.from_pretrained(
	base_model_id,
	num_labels=1, # single logit for BCEWithLogitsLoss
	)
	model = PeftModel.from_pretrained(base_model, adapter_id)
	model.eval()
	````

	### Inference with threshold

	```python
	# load threshold
	with open("threshold.json") as f:
	thr = json.load(f)["threshold"] # 0.9033

	def predict_proba(texts):
	enc = tokenizer(
	texts,
	padding=True,
	truncation=True,
	max_length=512,
	return_tensors="pt",
	)
	with torch.no_grad():
	logits = model(**enc).logits.squeeze(-1)
	probs = torch.sigmoid(logits)
	return probs.cpu().numpy()

	def predict_label(texts, threshold=thr):
	probs = predict_proba(texts)
	return (probs >= threshold).astype(int)

	# example
	texts = ["Some example text to classify"]
	probs = predict_proba(texts)
	labels = predict_label(texts)
	print(probs, labels) # label 1 = AI, 0 = Human
	```

	---

	## Notes

	* Classifier head is trainable together with LoRA layers (unfrozen after applying PEFT).
	* Training used:

	* `bf16=True`
	* `optim="adamw_torch_fused"`
	* cosine-with-restarts scheduler
	* LR scaled down from HPO to account for full-dataset (~14k steps).
	* Threshold `0.9033` was chosen as the max-F1 point on the validation set.
	You can adjust it if you prefer fewer false positives or fewer false negatives.