LLM Authorship Attribution — RoBERTa-base

Fine-tuned roberta-base that identifies which of 12 LLMs (or a human) wrote a piece of text.

Test accuracy: 86.51%
Macro F1: 0.865
Base model: roberta-base
Dataset: RAID — 796,800 train / 49,800 test, 12 balanced classes
Training: 3 epochs, fp16, max_len 256, lr 2e-5

Classes

chatgpt · cohere · cohere-chat · gpt2 · gpt3 · gpt4 · human · llama-chat · mistral · mistral-chat · mpt · mpt-chat

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("ArjunGangwar1/llm-authorship-roberta")
model = AutoModelForSequenceClassification.from_pretrained("ArjunGangwar1/llm-authorship-roberta")

text = "Your text here..."
enc = tok(text, max_length=256, truncation=True, padding="max_length", return_tensors="pt")
with torch.no_grad():
    probs = model(**enc).logits.softmax(-1)[0]
pred = model.config.id2label[int(probs.argmax())]
print(pred, float(probs.max()))

Per-class F1 (test set)

gpt3 0.94 · cohere 0.90 · gpt4 0.90 · human 0.89 · gpt2 0.87 · chatgpt 0.87 · llama-chat 0.88 · mpt 0.86 · mpt-chat 0.84 · mistral 0.83 · cohere-chat 0.81 · mistral-chat 0.80

Notes

The hardest classes are base-vs-chat variant pairs (e.g. mistral vs mistral-chat), which share vocabulary but differ in style.

Code: github.com/Arjun-Gangwar1/llm-authorship-attribution

Downloads last month: 38

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for ArjunGangwar1/llm-authorship-roberta

Base model

FacebookAI/roberta-base

Finetuned

(2360)

this model