You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

roberta-ai-detector-v2

RoBERTa-based AI text detector fine-tuned for academic writing

Model Description

This model is fine-tuned to detect AI-generated text in academic papers and essays. It distinguishes between human-written and AI-generated content with high accuracy.

  • Model type: roberta
  • Language(s): EN
  • License: Apache 2.0
  • Fine-tuned from: roberta-base

Intended Use

This model is intended for:

  • Detecting AI-generated content in academic submissions
  • Research on AI text detection
  • Educational tools for academic integrity

Important: This model should be used as one signal among many when evaluating text authenticity. It should not be the sole basis for academic misconduct decisions.

Performance

Metric Score
Accuracy 99.04%
F1 Score 99.04%
ROC AUC 99.74%

Training Data

The model was trained on 56,213 samples of paired human and AI-generated academic text, including outputs from:

  • Claude (Anthropic)
  • GPT models (OpenAI)
  • Gemini (Google)

Evaluation

Evaluated on 11,023 held-out test samples.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model
model_name = "coai/roberta-ai-detector-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Predict
text = "Your text to analyze..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    ai_probability = probs[0][1].item()  # Probability of AI-generated

print(f"AI Probability: {ai_probability:.2%}")

Limitations

  • Optimized for academic/formal writing; may be less accurate on casual text
  • Performance may vary on text from AI models not in the training set
  • Should not be used as the sole determinant of academic misconduct
  • May have reduced accuracy on very short texts (<50 words)

Ethical Considerations

  • False positives can have serious consequences for students
  • Always use human judgment alongside model predictions
  • Consider the context and provide opportunities for appeal
  • This tool is meant to assist, not replace, human evaluation

Citation

If you use this model, please cite:

@misc{roberta_ai_detector_v2},
  author = {COAI},
  title = {roberta-ai-detector-v2: AI Text Detection Model},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/coai/roberta-ai-detector-v2}
}

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month
4
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for coai/roberta-ai-detector-v2

Finetuned
(2089)
this model

Evaluation results