ai-detection / README.md

Add calibration.json for temperature scaling metrics, remove predictions_val.csv, and update thresholds in results.json and threshold.json. Update README.md to reflect changes in metrics and file structure.

5f029c0 5 months ago

preview code

raw

history blame

5.95 kB

	---
	language:
	- en
	pipeline_tag: text-classification
	library_name: peft
	base_model: microsoft/deberta-v3-large
	datasets:
	- stealthcode/ai-detection
	tags:
	- lora
	- ai-detection
	- binary-classification
	- deberta-v3-large
	metrics:
	- accuracy
	- f1
	- auroc
	- average_precision
	---

	# AI Detector LoRA (DeBERTa-v3-large)

	LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.3M English samples
	(`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model.

	- Base model: `microsoft/deberta-v3-large`
	- Task: Binary classification (AI vs Human)
	- Head: Single-logit + `BCEWithLogitsLoss`
	- Adapter type: LoRA (`peft`)
	- Hardware: H100 SXM, bf16, multi-GPU
	- Final decision threshold: 0.9284 (max-F1 on calibration set)

	---

	## Files in this repo

	- `adapter/` – LoRA weights saved with `peft_model.save_pretrained(...)`
	- `merged_model/` – fully merged model (base + LoRA) for standalone use
	- `threshold.json` – chosen deployment threshold and validation F1
	- `calibration.json` – temperature scaling parameters and calibration metrics
	- `results.json` – hyperparameters, validation threshold search, test metrics
	- `training_log_history.csv` – raw Trainer log history
	- `predictions_calib.csv` – calibration-set probabilities and labels
	- `predictions_test.csv` – test probabilities and labels
	- `figures/` – training and evaluation plots
	- `README.md` – this file

	---

	## Metrics (test set)

	Using threshold 0.9284:

	\| Metric \| Value \|
	\| ---------------------- \| ------ \|
	\| AUROC \| 0.9979 \|
	\| Average Precision (AP) \| 0.9977 \|
	\| F1 \| 0.9773 \|
	\| Accuracy \| 0.9797 \|
	\| Precision \| 0.9909 \|
	\| Recall \| 0.9640 \|
	\| Specificity \| 0.9927 \|

	Confusion matrix (test):

	- True Negatives (Human correctly): 123,936
	- False Positives (Human → AI): 912
	- False Negatives (AI → Human): 3,723
	- True Positives (AI correctly): 99,816

	### Calibration

	- Method: temperature scaling
	- Temperature (T): 1.2807
	- Calibration set: calibration
	- Test ECE: 0.0119 → 0.0159 (after calibration)
	- Test Brier: 0.01812 → 0.01829 (after calibration)

	---

	## Plots

	### Training & validation

	- Learning curves:

	![Learning curves](./figures/fig_learning_curves.png)

	- Eval metrics over time:

	![Eval metrics](./figures/fig_eval_metrics.png)

	### Validation set

	- ROC:

	![ROC (calib)](./figures/fig_roc_calib.png)

	- Precision–Recall:

	![PR (calib)](./figures/fig_pr_calib.png)

	- Calibration curve:

	![Calibration (calib)](./figures/fig_calibration_calib.png)

	- F1 vs threshold:

	![F1 vs threshold (calib)](./figures/fig_threshold_f1_calib.png)

	### Test set

	- ROC:

	![ROC (test)](./figures/fig_roc_test.png)

	- Precision–Recall:

	![PR (test)](./figures/fig_pr_test.png)

	- Calibration curve:

	![Calibration (test)](./figures/fig_calibration_test.png)

	- Confusion matrix:

	![Confusion matrix (test)](./figures/fig_confusion_test.png)

	---

	## Usage

	### Load base + LoRA adapter

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	from peft import PeftModel
	import torch
	import json

	base_model_id = "microsoft/deberta-v3-large"
	adapter_id = "stealthcode/ai-detection" # or local: "./adapter"

	tokenizer = AutoTokenizer.from_pretrained(base_model_id)

	base_model = AutoModelForSequenceClassification.from_pretrained(
	base_model_id,
	num_labels=1, # single logit for BCEWithLogitsLoss
	)
	model = PeftModel.from_pretrained(base_model, adapter_id)
	model.eval()
	```

	### Inference with threshold

	```python
	# load threshold
	with open("threshold.json") as f:
	thr = json.load(f)["threshold"] # 0.9284

	def predict_proba(texts):
	enc = tokenizer(
	texts,
	padding=True,
	truncation=True,
	max_length=512,
	return_tensors="pt",
	)
	with torch.no_grad():
	logits = model(**enc).logits.squeeze(-1)
	probs = torch.sigmoid(logits)
	return probs.cpu().numpy()

	def predict_label(texts, threshold=thr):
	probs = predict_proba(texts)
	return (probs >= threshold).astype(int)

	# example
	texts = ["Some example text to classify"]
	probs = predict_proba(texts)
	labels = predict_label(texts)
	print(probs, labels) # label 1 = AI, 0 = Human
	```

	### Load merged model (no PEFT required)

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch, json

	model_dir = "./merged_model"
	tokenizer = AutoTokenizer.from_pretrained(model_dir)
	model = AutoModelForSequenceClassification.from_pretrained(model_dir)
	model.eval()

	with open("threshold.json") as f:
	thr = json.load(f)["threshold"] # 0.9284

	def predict_proba(texts):
	enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
	with torch.no_grad():
	logits = model(**enc).logits.squeeze(-1)
	probs = torch.sigmoid(logits)
	return probs.cpu().numpy()
	```

	### Optional: apply temperature scaling to logits

	```python
	import json
	with open("calibration.json") as f:
	T = json.load(f)["temperature"] # e.g., 1.2807

	def predict_proba_calibrated(texts):
	enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
	with torch.no_grad():
	logits = model(**enc).logits.squeeze(-1)
	probs = torch.sigmoid(logits / T)
	return probs.cpu().numpy()
	```

	---

	## Notes

	- Classifier head is trainable together with LoRA layers (unfrozen after applying PEFT).
	- Training used:

	- `bf16=True`
	- `optim="adamw_torch_fused"`
	- cosine-with-restarts scheduler
	- LR scaled down from HPO to account for full-dataset (~14k steps).

	- Threshold `0.9284` was chosen as the max-F1 point on the calibration set.
	You can adjust it if you prefer fewer false positives or fewer false negatives.