TMR: Target Mining RoBERTa - AI Text Detector

A robust AI-generated text detector based on RoBERTa-base, trained with Focal Loss and Self-Hard-Negative iterative mining on the RAID dataset.

Model Description

TMR (Target Mining RoBERTa) is designed to detect AI-generated text with high accuracy while maintaining low false positive rates. The model uses:

  • Architecture: RoBERTa-base (125M parameters)
  • Loss Function: Focal Loss (gamma=2.0, alpha=[0.85, 0.15]) to focus on hard examples
  • Training Strategy: Self-Hard-Negative (Self-HN) iterative mining
  • Training Data: 50,000 stratified samples from RAID (45% human, 55% AI)

Performance

RAID held-out evaluation (100k samples, seed 999):

Metric Score
AUROC 0.9972
Accuracy 97.30%
FPR 2.27%
FNR 2.71%
F1 Score 0.9856

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_path = "Oxidane/tmr-ai-text-detector"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)

# Predict
text = "Your text here..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    probs = torch.softmax(logits, dim=-1)

# Probability that text is AI-generated
ai_probability = probs[0][1].item()
print(f"AI probability: {ai_probability:.4f}")

# Binary classification (threshold=0.5)
is_ai = ai_probability > 0.5
print(f"Prediction: {'AI-generated' if is_ai else 'Human-written'}")

Training

Trained on the RAID dataset (ACL 2024) for 3 epochs with Focal Loss (gamma=2.0). The model uses Self-Hard-Negative mining: iteratively identifying human samples misclassified as AI, then retraining with these hard examples.

Limitations

  • Language: Primarily trained on English text
  • Threshold: Optimized for threshold=0.5

License

MIT License

Citation

If you use this model, please cite:

@misc{tmr-ai-text-detector,
  title={TMR: Target Mining RoBERTa for AI Text Detection},
  author={Oxidane},
  year={2025},
  url={https://huggingface.co/Oxidane/tmr-ai-text-detector}
}

Contact

For questions, contact me@oxidane.net

Downloads last month
91
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Oxidane/tmr-ai-text-detector

Finetuned
(2072)
this model