File size: 1,521 Bytes
a252672
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b9e7c1a
a252672
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
language: en
license: apache-2.0
tags:
  - text-classification
  - ai-generated-text-detection
  - roberta
  - adversarial-training
metrics:
  - roc_auc
---

# RADAR Detector (RoBERTa-large)

Adversarially trained AI-generated text detector based on the RADAR framework
([Hu et al., NeurIPS 2023](https://arxiv.org/abs/2307.03838)), extended with
a multi-evasion attack pool for robust detection.

## Training

- **Base model**: `roberta-large`
- **Dataset**: [RAID](https://huggingface.co/datasets/liamdugan/raid) (Dugan et al., ACL 2024)
- **Evasion attacks seen during training**: t5_paraphrase, synonym_replacement, homoglyphs, article_deletion, misspelling, number_swap, whitespace_addition, upper_lower_swap, zero_width_space, insert_paragraphs, alternative_spelling
- **Best macro AUROC**: 0.9782
- **Generators**: chatgpt, gpt2, gpt3, gpt4, cohere, cohere-chat, llama-chat,
  mistral, mistral-chat, mpt, mpt-chat

## Usage

```python
from transformers import RobertaTokenizer, RobertaForSequenceClassification
import torch

tokenizer = RobertaTokenizer.from_pretrained("Shushant/adal_v5_raid")
model     = RobertaForSequenceClassification.from_pretrained("Shushant/adal_v5_raid")
model.eval()

text = "Your text here."
enc  = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    probs = torch.softmax(model(**enc).logits, dim=-1)[0]
print(f"P(human)={probs[1]:.3f}  P(AI)={probs[0]:.3f}")
```

## Label mapping
- Index 0 → AI-generated
- Index 1 → Human-written