Instructions to use Elia43/distilbert-ai-text-detector with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Elia43/distilbert-ai-text-detector with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Elia43/distilbert-ai-text-detector")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Elia43/distilbert-ai-text-detector") model = AutoModelForSequenceClassification.from_pretrained("Elia43/distilbert-ai-text-detector") - Notebooks
- Google Colab
- Kaggle
DistilBERT — AI-Generated Text Detector
Fine-tuned distilbert-base-uncased for binary classification of human vs. AI-generated text on the HC3 corpus.
Headline metrics on the held-out test set (11,238 samples, 5 domains):
| Metric | Value |
|---|---|
| F1-score | 0.9847 |
| Accuracy | 0.9891 |
| ROC-AUC | 0.9998 |
| Precision | 0.9706 |
| Recall | 0.9992 |
Statistically significant over the strongest classical baseline (Linear SVM + TF-IDF) by McNemar's test: χ² = 144.0, p ≈ 3.55 × 10⁻³³.
⚠️ Important: Aggregate metrics hide a cross-domain fairness gap. See the Limitations section before deploying.
📄 Full technical report (PDF) · 🎓 Presentation slides (PDF) · 💻 GitHub repository (code & reproduction)
Intended use
Primary intended use. Research, education, and evaluation experiments studying AI-generated text detection. Useful as a baseline transformer for comparison to newer detection approaches, or as a starting point for further fine-tuning on more recent generators (GPT-4, Claude, Gemini, open-source LLMs).
Out-of-scope use cases.
- ❌ Academic discipline decisions (plagiarism cases, expulsion, grade penalties)
- ❌ Hiring decisions (filtering applicant essays or cover letters)
- ❌ Content moderation in production without domain-specific calibration and human review
- ❌ Any high-stakes decision without explicit fairness audit on the deployment population
- ❌ Non-English text — the model has not been evaluated on any non-English content
The reason these are out of scope is documented in the Limitations section.
How to use
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model_id = "Elia43/distilbert-ai-text-detector"
model = AutoModelForSequenceClassification.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
text = "Your text sample here."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)[0]
# Label mapping: 0 = human-written, 1 = AI-generated
print(f"P(human) = {probs[0]:.4f}")
print(f"P(AI) = {probs[1]:.4f}")
Always report the probability, not the binary label. Calibrated confidence is more honest than a hard prediction.
Training data
HC3 (Human ChatGPT Comparison Corpus) — Guo et al., 2023.
- ~75,000 samples after quality filtering (removing texts <20 or >1000 words, deduplication)
- 5 domains:
finance,medicine,open_qa,reddit_eli5,wiki_csai - AI source: ChatGPT (
gpt-3.5-turbo) - Stratified 70/15/15 split on the composite key
label × domain(random_state=42)
The training subset used for fine-tuning was a 15,000-sample stratified slice of the training set — DistilBERT saturates quickly on this task, and using the full set yields no measurable improvement.
Training procedure
| Parameter | Value |
|---|---|
| Base model | distilbert-base-uncased |
| Task head | Sequence classification, 2 labels |
| Optimizer | AdamW |
| Learning rate | 2e-5 |
| LR schedule | Linear warmup (10% of steps), linear decay |
| Weight decay | 0.01 |
| Batch size | 16 |
| Max sequence length | 256 tokens |
| Epochs | 3 |
| Gradient clipping | 1.0 (max grad norm) |
| Seed | 42 |
| Mixed precision | FP16 (Colab T4 GPU) |
| Best-checkpoint criterion | Validation F1 |
Training took ~30 minutes on a single Google Colab T4 GPU.
Evaluation
Held-out test set: 11,238 samples, stratified across 5 domains.
Overall metrics
| Metric | Value |
|---|---|
| Accuracy | 0.9891 |
| Precision | 0.9706 |
| Recall | 0.9992 |
| F1 | 0.9847 |
| ROC-AUC | 0.9998 |
Per-domain F1
| Domain | F1 |
|---|---|
| medicine | 1.000 |
| finance | 0.990 |
| open_qa | 0.988 |
| reddit_eli5 | 0.985 |
| wiki_csai (technical writing) | 0.916 |
The 9.5% error rate on wiki_csai is over 10× higher than on medical text, and ~9× higher than the project-wide error rate of 1.09%.
Comparison to classical baselines
| Model | F1 |
|---|---|
| Multinomial Naive Bayes | 0.8731 |
| Bi-LSTM (frozen GloVe-100d) | 0.9338 |
| Logistic Regression (TF-IDF) | 0.9523 |
| Linear SVM (TF-IDF) | 0.9531 |
| DistilBERT (this model) | 0.9847 |
McNemar's test (DistilBERT vs. Linear SVM): χ² = 144.0, p ≈ 3.55 × 10⁻³³.
Limitations and bias
This model has documented limitations. Read this section before using it for anything that affects people.
Domain bias
Error rate ranges from 0% on medical text to 9.54% on technical Wikipedia-style content. This pattern is consistent across every model architecture tested (classical, recurrent, transformer), suggesting the difficulty is intrinsic to the domain — not solvable by switching models. Technical writers face disproportionate misclassification risk.
Length bias
Short texts (under 50 words) have systematically higher error rates than longer documents. Relevant because student short-answer questions, social media posts, and email replies are short by nature.
Non-native English speaker bias (inherited from the literature)
Liang et al. (2023) showed that GPT detectors trained on similar data misclassify >50% of TOEFL essays by non-native English speakers as AI-generated, vs. <5% for native speakers. The mechanism — lower lexical diversity, more formal phrasing, more uniform syntax — exactly matches the features this model relies on. We have strong reason to believe this model exhibits the same bias. It was not directly tested in our evaluation, but the mechanism is structurally identical.
Generator coverage
The training data only contains ChatGPT (gpt-3.5-turbo) outputs. Performance on text from GPT-4, Claude, Gemini, Llama, Mistral, or other generators is untested and likely degraded. Detectors trained on one generation system generalize poorly to others.
Other limitations
- English only
- 5 domains is a narrow slice of real-world writing — news, fiction, academic papers, code, and informal chat are not represented
- No adversarial evaluation against paraphrasing attacks or human editing
- Static evaluation — performance degrades as language models evolve. Models like this need re-evaluation every 3–6 months.
Responsible deployment recommendations
If you do deploy this model (or any AI-text detector):
- Never use as sole evidence. Detection should be an informational signal; final high-stakes decisions require human review.
- Report calibrated probabilities, not binary labels. "P(AI) = 0.83 ± 0.12" is honest. "This is AI" is not.
- Calibrate on the deployment distribution before going live. A model trained on HC3 will not have the right thresholds for student essays, journalism, or business writing.
- Audit for fairness across demographic and linguistic groups, including non-native English writers, before any decision-making use.
- Avoid punitive use cases. The harms of false positives in academic discipline or hiring outweigh the benefits.
- Re-evaluate every 3–6 months. Generators evolve; detectors decay.
- Be transparent. People being analyzed should know it's happening and have a way to challenge results.
- Consider not deploying. For some use cases, the most ethical choice is no detector at all.
Citation
@misc{khater2026aitextdetection,
author = {Elia Khater},
title = {DistilBERT for AI-Generated Text Detection: A Comparative Study with Cross-Domain Fairness Analysis},
year = {2026},
howpublished = {\url{https://huggingface.co/Elia43/distilbert-ai-text-detector}},
note = {Final project, Introduction to Natural Language Processing, Université Saint-Joseph (USJ), Spring 2026.
Companion repository: \url{https://github.com/Elia43/ai-text-detection}}
}
License
MIT License. The underlying HC3 dataset is licensed by its original authors (Guo et al., 2023) — please respect their terms when reusing this model or extending this work.
Author
Elia Khater — Mathematics & Data Science, Université Saint-Joseph (USJ), Beirut GitHub · LinkedIn · eliakhater7@gmail.com
- Downloads last month
- 46
Model tree for Elia43/distilbert-ai-text-detector
Base model
distilbert/distilbert-base-uncasedDataset used to train Elia43/distilbert-ai-text-detector
Paper for Elia43/distilbert-ai-text-detector
Evaluation results
- F1 Score on HC3 (Human ChatGPT Comparison Corpus)self-reported0.985
- Accuracy on HC3 (Human ChatGPT Comparison Corpus)self-reported0.989
- ROC AUC on HC3 (Human ChatGPT Comparison Corpus)self-reported1.000