| --- |
| language: en |
| license: apache-2.0 |
| tags: |
| - text-classification |
| - ai-generated-text-detection |
| - roberta |
| - adversarial-training |
| metrics: |
| - roc_auc |
| --- |
| |
| # RADAR Detector (RoBERTa-large) |
|
|
| Adversarially trained AI-generated text detector based on the RADAR framework |
| ([Hu et al., NeurIPS 2023](https://arxiv.org/abs/2307.03838)), extended with |
| a multi-evasion attack pool for robust detection. |
|
|
| ## Training |
|
|
| - **Base model**: `roberta-large` |
| - **Dataset**: [RAID](https://huggingface.co/datasets/liamdugan/raid) (Dugan et al., ACL 2024) |
| - **Evasion attacks seen during training**: t5_paraphrase, synonym_replacement, homoglyphs, article_deletion, misspelling, number_swap, whitespace_addition, upper_lower_swap, zero_width_space, insert_paragraphs, alternative_spelling |
| - **Best macro AUROC**: 0.9782 |
| - **Generators**: chatgpt, gpt2, gpt3, gpt4, cohere, cohere-chat, llama-chat, |
| mistral, mistral-chat, mpt, mpt-chat |
| |
| ## Usage |
| |
| ```python |
| from transformers import RobertaTokenizer, RobertaForSequenceClassification |
| import torch |
| |
| tokenizer = RobertaTokenizer.from_pretrained("Shushant/adal_v5_raid") |
| model = RobertaForSequenceClassification.from_pretrained("Shushant/adal_v5_raid") |
| model.eval() |
| |
| text = "Your text here." |
| enc = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) |
| with torch.no_grad(): |
| probs = torch.softmax(model(**enc).logits, dim=-1)[0] |
| print(f"P(human)={probs[1]:.3f} P(AI)={probs[0]:.3f}") |
| ``` |
| |
| ## Label mapping |
| - Index 0 → AI-generated |
| - Index 1 → Human-written |
|
|