language:
- en
tags:
- ai-detection
- text-classification
- roberta
- distilroberta
- worm
- generated-text-detection
license: mit
datasets:
- noumenon-labs/Mega-WORM-Cleaned
metrics:
- accuracy
- f1
model-index:
- name: Earlybird
results:
- task:
type: text-classification
name: AI Detection
dataset:
name: WORM (Wait, Original or Machine)
type: noumenon-labs/Mega-WORM-Cleaned
metrics:
- type: accuracy
value: 98.2
- type: f1
value: 0.982
base_model:
- distilbert/distilroberta-base
pipeline_tag: text-classification
π¦ Earlybird: Fast & Accurate AI Text Detection
Earlybird is a lightweight, high-speed AI text detection model designed to classify text as either Human-Written or AI-Generated.
Built on the efficient DistilRoBERTa architecture, it was fine-tuned on the W.O.R.M. (Wait, Original or Machine) dataset.
β‘ Model Stats
- Architecture: DistilRoBERTa (82M parameters)
- Primary Task: Binary Classification (Human vs. AI)
- Context Window: 512 Tokens
- Inference Speed: <50ms (CPU) / <5ms (GPU)
π Overview
Earlybird is designed for rapid, real-time detection. Unlike generative Large Language Models (LLMs) that are slow and resource-heavy, Earlybird uses a distilled encoder architecture. This allows it to process text in milliseconds, making it ideal for high-volume applications like content moderation, academic integrity checks, and spam filtering.
The model analyzes stylistic patterns, perplexity, and token transitions to determine if a text was written by a human or generated by models like GPT-4, Claude, Llama, or Mistral.
π Training Data
Earlybird was trained on Mega-WORM, a unified dataset curated from four major open-source collections. The training data was rigorously filtered to ensure high-quality prose, focusing on texts with sufficient context (essays, blog posts, articles).
π Performance Benchmarks
The model excels at identifying AI-generated content in Medium and Long-form text (over 100 words). However, users should be aware of limitations regarding very short texts.
Detailed Length Breakdown
| Text Category | Word Count | Accuracy | Performance |
|---|---|---|---|
| Short Text | <100 words | 76.31% | β οΈ Weak |
| Medium Text | 100 - 300 words | 96.48% | β Excellent |
| Long Text | 300+ words | 95.01% | β Excellent |
Overall Metrics
| Metric | Score |
|---|---|
| Overall Accuracy | 89.43% |
β οΈ Important Limitations
- Short Text Instability: As shown in the benchmarks, the model's accuracy drops significantly (to ~76%) on texts under 100 words (e.g., short tweets, single sentences). It is not recommended for use on short social media comments without human review.
- Context Requirement: The model relies on analyzing sentence structure and paragraph flow. Without enough words, it lacks the context needed to make a high-confidence prediction.
- False Positives: Highly formal, academic human writing can occasionally be flagged as AI due to its rigid structure.