Moodlerz/roberta-detector-eli5

What is this?

This model was fine-tuned as part of a research project comparing transformer-based AI-text detectors across two benchmark datasets: HC3 and ELI5.

The task is binary classification:

  • Label 0 โ†’ Human-written text
  • Label 1 โ†’ LLM-generated text

Model details

RoBERTa (roberta-base) fine-tuned on ELI5 for AI-text detection. Binary classifier: human (0) vs LLM-generated (1). Trained with 1 epoch, dropout=0.2, early stopping on ROC-AUC.

Training setup

Setting Value
Epochs 1
Batch size (train) 16
Learning rate 2e-5
Warmup steps 500
Weight decay 0.01
Dropout 0.2
Max seq length 512
Validation split 10%
Best model metric ROC-AUC

Datasets

  • HC3 โ€” Human ChatGPT Comparison Corpus
  • ELI5 โ€” Explain Like I'm 5 (Reddit QA dataset)
    Cross-dataset evaluation (e.g. trained on HC3, tested on ELI5) was used to measure generalisability of each detector.

How to load

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained("Moodlerz/roberta-detector-eli5")
tokenizer = AutoTokenizer.from_pretrained("Moodlerz/roberta-detector-eli5")

text = "Your input text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
prob_llm = torch.softmax(logits, dim=-1)[0][1].item()
print(f"P(LLM-generated): {prob_llm:.4f}")

Notes

  • Local training dir: ./models/RoBERTa_eli5
  • All models in this series are private repos under Moodlerz.
  • Part of a larger study โ€” do not use for production content moderation without further evaluation.
Downloads last month
9
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support