protagonist
/

roberta-eloquent

Text Classification

ai-text-detection

text-embeddings-inference

Model card Files Files and versions

roberta-eloquent / README.md

protagonist's picture

Upload folder using huggingface_hub

9445fad verified 19 days ago

|

history blame contribute delete

1.96 kB

	---
	license: cc-by-4.0
	library_name: transformers
	base_model: roberta-base
	tags:
	- text-classification
	- ai-text-detection
	- voight-kampff
	- pan-2025
	datasets:
	- pan-webis-de/pan25-generative-ai-detection-task1
	language: en
	pipeline_tag: text-classification
	---

	# eloquent26 RoBERTa-base detector — PAN'25/26 Voight-Kampff Subtask 1

	Fine-tuned `roberta-base` on the official PAN'25/26 Generative AI Detection
	training split (Zenodo DOI 10.5281/zenodo.14962653). Used in the eloquent26
	detector panel for the ELOQUENT 2026 Voight-Kampff research paper.

	## Training data
	- Source: Bevendorff et al., PAN'25/26 Generative AI Detection: Voight-Kampff
	AI Detection Sensitivity (Zenodo, March 2025).
	- Split: train.jsonl (n = 23,707; 9,101 human + 14,606 LLM-generated).
	- Generator models in train: gpt-3.5-turbo, gpt-4o, gpt-4o-mini, o3-mini,
	gemini-1.5-pro, gemini-2.0-flash, llama-3.1-8b-instruct, llama-3.3-70b-instruct,
	ministral-8b-instruct-2410, deepseek-r1-distill-qwen-32b, falcon3-10b-instruct,
	gpt-4.5-preview, gpt-4-turbo-paraphrase, gemini-pro.
	- Genres in train: essays, fiction, news.

	## Training config
	- Epochs: 2
	- Batch size: 16
	- Learning rate: 2e-05
	- Weight decay: 0.01
	- Warmup ratio: 0.1
	- Max length: 512
	- Seed: 42
	- Mixed precision: fp16

	## Final val metrics
	```
	{
	"eval_loss": 0.07029607146978378,
	"eval_accuracy": 0.9877403176372248,
	"eval_f1": 0.9905579399141631,
	"eval_roc_auc": 0.9994946525295825,
	"eval_runtime": 3.7238,
	"eval_samples_per_second": 963.793,
	"eval_steps_per_second": 30.345,
	"epoch": 2.0
	}
	```

	## Usage

	```python
	from transformers import pipeline
	clf = pipeline("text-classification", model="protagonist/roberta-eloquent", top_k=None)
	print(clf("Your text here"))
	```

	Returned label `llm` carries the probability that the text was machine-generated.

	## Reproduce

	The training script lives at
	[`notebooks/train_roberta_a100.py`](https://github.com/...) in the eloquent26 repo.