TMR: Target Mining RoBERTa - AI Text Detector
A robust AI-generated text detector based on RoBERTa-base, trained with Focal Loss and Self-Hard-Negative iterative mining on the RAID dataset.
Model Description
TMR (Target Mining RoBERTa) is designed to detect AI-generated text with high accuracy while maintaining low false positive rates. The model uses:
- Architecture: RoBERTa-base (125M parameters)
- Loss Function: Focal Loss (gamma=2.0, alpha=[0.85, 0.15]) to focus on hard examples
- Training Strategy: Self-Hard-Negative (Self-HN) iterative mining
- Training Data: 50,000 stratified samples from RAID (45% human, 55% AI)
Performance
RAID held-out evaluation (100k samples, seed 999):
| Metric | Score |
|---|---|
| AUROC | 0.9972 |
| Accuracy | 97.30% |
| FPR | 2.27% |
| FNR | 2.71% |
| F1 Score | 0.9856 |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_path = "Oxidane/tmr-ai-text-detector"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)
# Predict
text = "Your text here..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding=True)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.softmax(logits, dim=-1)
# Probability that text is AI-generated
ai_probability = probs[0][1].item()
print(f"AI probability: {ai_probability:.4f}")
# Binary classification (threshold=0.5)
is_ai = ai_probability > 0.5
print(f"Prediction: {'AI-generated' if is_ai else 'Human-written'}")
Training
Trained on the RAID dataset (ACL 2024) for 3 epochs with Focal Loss (gamma=2.0). The model uses Self-Hard-Negative mining: iteratively identifying human samples misclassified as AI, then retraining with these hard examples.
Limitations
- Language: Primarily trained on English text
- Threshold: Optimized for threshold=0.5
License
MIT License
Citation
If you use this model, please cite:
@misc{tmr-ai-text-detector,
title={TMR: Target Mining RoBERTa for AI Text Detection},
author={Oxidane},
year={2025},
url={https://huggingface.co/Oxidane/tmr-ai-text-detector}
}
Contact
For questions, contact me@oxidane.net
- Downloads last month
- 91
Model tree for Oxidane/tmr-ai-text-detector
Base model
FacebookAI/roberta-base