roberta-ai-detector-v2
RoBERTa-based AI text detector fine-tuned for academic writing
Model Description
This model is fine-tuned to detect AI-generated text in academic papers and essays. It distinguishes between human-written and AI-generated content with high accuracy.
- Model type: roberta
- Language(s): EN
- License: Apache 2.0
- Fine-tuned from: roberta-base
Intended Use
This model is intended for:
- Detecting AI-generated content in academic submissions
- Research on AI text detection
- Educational tools for academic integrity
Important: This model should be used as one signal among many when evaluating text authenticity. It should not be the sole basis for academic misconduct decisions.
Performance
| Metric | Score |
|---|---|
| Accuracy | 99.04% |
| F1 Score | 99.04% |
| ROC AUC | 99.74% |
Training Data
The model was trained on 56,213 samples of paired human and AI-generated academic text, including outputs from:
- Claude (Anthropic)
- GPT models (OpenAI)
- Gemini (Google)
Evaluation
Evaluated on 11,023 held-out test samples.
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model
model_name = "coai/roberta-ai-detector-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Predict
text = "Your text to analyze..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
ai_probability = probs[0][1].item() # Probability of AI-generated
print(f"AI Probability: {ai_probability:.2%}")
Limitations
- Optimized for academic/formal writing; may be less accurate on casual text
- Performance may vary on text from AI models not in the training set
- Should not be used as the sole determinant of academic misconduct
- May have reduced accuracy on very short texts (<50 words)
Ethical Considerations
- False positives can have serious consequences for students
- Always use human judgment alongside model predictions
- Consider the context and provide opportunities for appeal
- This tool is meant to assist, not replace, human evaluation
Citation
If you use this model, please cite:
@misc{roberta_ai_detector_v2},
author = {COAI},
title = {roberta-ai-detector-v2: AI Text Detection Model},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/coai/roberta-ai-detector-v2}
}
Contact
For questions or issues, please open an issue on the model repository.
- Downloads last month
- 4
Model tree for coai/roberta-ai-detector-v2
Base model
FacebookAI/roberta-baseEvaluation results
- Accuracyself-reported99.040
- F1 Scoreself-reported99.040
- ROC AUCself-reported99.740