Text Classification
Transformers
Safetensors
English
roberta
ai-text-detection
voight-kampff
pan-2025
text-embeddings-inference
Instructions to use protagonist/roberta-eloquent with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use protagonist/roberta-eloquent with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="protagonist/roberta-eloquent")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("protagonist/roberta-eloquent") model = AutoModelForSequenceClassification.from_pretrained("protagonist/roberta-eloquent") - Notebooks
- Google Colab
- Kaggle
| license: cc-by-4.0 | |
| library_name: transformers | |
| base_model: roberta-base | |
| tags: | |
| - text-classification | |
| - ai-text-detection | |
| - voight-kampff | |
| - pan-2025 | |
| datasets: | |
| - pan-webis-de/pan25-generative-ai-detection-task1 | |
| language: en | |
| pipeline_tag: text-classification | |
| # eloquent26 RoBERTa-base detector — PAN'25/26 Voight-Kampff Subtask 1 | |
| Fine-tuned `roberta-base` on the official PAN'25/26 Generative AI Detection | |
| training split (Zenodo DOI 10.5281/zenodo.14962653). Used in the eloquent26 | |
| detector panel for the ELOQUENT 2026 Voight-Kampff research paper. | |
| ## Training data | |
| - Source: Bevendorff et al., PAN'25/26 Generative AI Detection: Voight-Kampff | |
| AI Detection Sensitivity (Zenodo, March 2025). | |
| - Split: train.jsonl (n = 23,707; 9,101 human + 14,606 LLM-generated). | |
| - Generator models in train: gpt-3.5-turbo, gpt-4o, gpt-4o-mini, o3-mini, | |
| gemini-1.5-pro, gemini-2.0-flash, llama-3.1-8b-instruct, llama-3.3-70b-instruct, | |
| ministral-8b-instruct-2410, deepseek-r1-distill-qwen-32b, falcon3-10b-instruct, | |
| gpt-4.5-preview, gpt-4-turbo-paraphrase, gemini-pro. | |
| - Genres in train: essays, fiction, news. | |
| ## Training config | |
| - Epochs: 2 | |
| - Batch size: 16 | |
| - Learning rate: 2e-05 | |
| - Weight decay: 0.01 | |
| - Warmup ratio: 0.1 | |
| - Max length: 512 | |
| - Seed: 42 | |
| - Mixed precision: fp16 | |
| ## Final val metrics | |
| ``` | |
| { | |
| "eval_loss": 0.07029607146978378, | |
| "eval_accuracy": 0.9877403176372248, | |
| "eval_f1": 0.9905579399141631, | |
| "eval_roc_auc": 0.9994946525295825, | |
| "eval_runtime": 3.7238, | |
| "eval_samples_per_second": 963.793, | |
| "eval_steps_per_second": 30.345, | |
| "epoch": 2.0 | |
| } | |
| ``` | |
| ## Usage | |
| ```python | |
| from transformers import pipeline | |
| clf = pipeline("text-classification", model="protagonist/roberta-eloquent", top_k=None) | |
| print(clf("Your text here")) | |
| ``` | |
| Returned label `llm` carries the probability that the text was machine-generated. | |
| ## Reproduce | |
| The training script lives at | |
| [`notebooks/train_roberta_a100.py`](https://github.com/...) in the eloquent26 repo. | |