--- license: cc-by-4.0 library_name: transformers base_model: roberta-base tags: - text-classification - ai-text-detection - voight-kampff - pan-2025 datasets: - pan-webis-de/pan25-generative-ai-detection-task1 language: en pipeline_tag: text-classification --- # eloquent26 RoBERTa-base detector — PAN'25/26 Voight-Kampff Subtask 1 Fine-tuned `roberta-base` on the official PAN'25/26 Generative AI Detection training split (Zenodo DOI 10.5281/zenodo.14962653). Used in the eloquent26 detector panel for the ELOQUENT 2026 Voight-Kampff research paper. ## Training data - Source: Bevendorff et al., PAN'25/26 Generative AI Detection: Voight-Kampff AI Detection Sensitivity (Zenodo, March 2025). - Split: train.jsonl (n = 23,707; 9,101 human + 14,606 LLM-generated). - Generator models in train: gpt-3.5-turbo, gpt-4o, gpt-4o-mini, o3-mini, gemini-1.5-pro, gemini-2.0-flash, llama-3.1-8b-instruct, llama-3.3-70b-instruct, ministral-8b-instruct-2410, deepseek-r1-distill-qwen-32b, falcon3-10b-instruct, gpt-4.5-preview, gpt-4-turbo-paraphrase, gemini-pro. - Genres in train: essays, fiction, news. ## Training config - Epochs: 2 - Batch size: 16 - Learning rate: 2e-05 - Weight decay: 0.01 - Warmup ratio: 0.1 - Max length: 512 - Seed: 42 - Mixed precision: fp16 ## Final val metrics ``` { "eval_loss": 0.07029607146978378, "eval_accuracy": 0.9877403176372248, "eval_f1": 0.9905579399141631, "eval_roc_auc": 0.9994946525295825, "eval_runtime": 3.7238, "eval_samples_per_second": 963.793, "eval_steps_per_second": 30.345, "epoch": 2.0 } ``` ## Usage ```python from transformers import pipeline clf = pipeline("text-classification", model="protagonist/roberta-eloquent", top_k=None) print(clf("Your text here")) ``` Returned label `llm` carries the probability that the text was machine-generated. ## Reproduce The training script lives at [`notebooks/train_roberta_a100.py`](https://github.com/...) in the eloquent26 repo.