LLM Authorship Attribution — RoBERTa-base
Fine-tuned roberta-base that identifies which of 12 LLMs (or a human) wrote a piece of text.
- Test accuracy: 86.51%
- Macro F1: 0.865
- Base model: roberta-base
- Dataset: RAID — 796,800 train / 49,800 test, 12 balanced classes
- Training: 3 epochs, fp16, max_len 256, lr 2e-5
Classes
chatgpt · cohere · cohere-chat · gpt2 · gpt3 · gpt4 · human · llama-chat · mistral · mistral-chat · mpt · mpt-chat
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tok = AutoTokenizer.from_pretrained("ArjunGangwar1/llm-authorship-roberta")
model = AutoModelForSequenceClassification.from_pretrained("ArjunGangwar1/llm-authorship-roberta")
text = "Your text here..."
enc = tok(text, max_length=256, truncation=True, padding="max_length", return_tensors="pt")
with torch.no_grad():
probs = model(**enc).logits.softmax(-1)[0]
pred = model.config.id2label[int(probs.argmax())]
print(pred, float(probs.max()))
Per-class F1 (test set)
gpt3 0.94 · cohere 0.90 · gpt4 0.90 · human 0.89 · gpt2 0.87 · chatgpt 0.87 · llama-chat 0.88 · mpt 0.86 · mpt-chat 0.84 · mistral 0.83 · cohere-chat 0.81 · mistral-chat 0.80
Notes
The hardest classes are base-vs-chat variant pairs (e.g. mistral vs mistral-chat),
which share vocabulary but differ in style.
- Downloads last month
- 38
Model tree for ArjunGangwar1/llm-authorship-roberta
Base model
FacebookAI/roberta-base