You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

roberta-ai-detector-v2

RoBERTa-based AI text detector fine-tuned for academic writing

Model Description

This model is fine-tuned to detect AI-generated text in academic papers and essays. It distinguishes between human-written and AI-generated content with high accuracy.

Model type: roberta
Language(s): EN
License: Apache 2.0
Fine-tuned from: roberta-base

Intended Use

This model is intended for:

Detecting AI-generated content in academic submissions
Research on AI text detection
Educational tools for academic integrity

Important: This model should be used as one signal among many when evaluating text authenticity. It should not be the sole basis for academic misconduct decisions.

Performance

Metric	Score
Accuracy	99.04%
F1 Score	99.04%
ROC AUC	99.74%

Training Data

The model was trained on 56,213 samples of paired human and AI-generated academic text, including outputs from:

Claude (Anthropic)
GPT models (OpenAI)
Gemini (Google)

Evaluation

Evaluated on 11,023 held-out test samples.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model
model_name = "coai/roberta-ai-detector-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Predict
text = "Your text to analyze..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    ai_probability = probs[0][1].item()  # Probability of AI-generated

print(f"AI Probability: {ai_probability:.2%}")

Limitations

Optimized for academic/formal writing; may be less accurate on casual text
Performance may vary on text from AI models not in the training set
Should not be used as the sole determinant of academic misconduct
May have reduced accuracy on very short texts (<50 words)

Ethical Considerations

False positives can have serious consequences for students
Always use human judgment alongside model predictions
Consider the context and provide opportunities for appeal
This tool is meant to assist, not replace, human evaluation

Citation

If you use this model, please cite:

@misc{roberta_ai_detector_v2},
  author = {COAI},
  title = {roberta-ai-detector-v2: AI Text Detection Model},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/coai/roberta-ai-detector-v2}
}

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month: 399

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for coai/roberta-ai-detector-v2

Base model

FacebookAI/roberta-base

Finetuned

(2360)

this model

Space using coai/roberta-ai-detector-v2 1

Evaluation results

Accuracy
self-reported

99.040
F1 Score
self-reported

99.040
ROC AUC
self-reported

99.740