File size: 3,941 Bytes
84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 ea13f99 84f7397 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | ---
language: en
tags:
- adaptive-classifier
- text-classification
- ai-detection
- ai-generated-text
- continuous-learning
license: apache-2.0
datasets:
- pangram/editlens_iclr
base_model: TrustSafeAI/RADAR-Vicuna-7B
metrics:
- accuracy
- f1
pipeline_tag: text-classification
model-index:
- name: adaptive-classifier/ai-detector
results:
- task:
type: text-classification
name: AI Text Detection (Binary)
dataset:
name: EditLens ICLR 2026
type: pangram/editlens_iclr
split: test
metrics:
- type: accuracy
value: 73.5
name: Accuracy
- type: f1
value: 72.1
name: Macro F1
---
# AI Text Detector (adaptive-classifier)
A binary AI text detector that classifies text as **human-written** or **AI-generated/edited**, built with [adaptive-classifier](https://github.com/codelion/adaptive-classifier) on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) benchmark.
## How It Works
Uses frozen embeddings from [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (a RoBERTa-large model adversarially trained for AI detection) as a feature extractor, with adaptive-classifier's prototype memory + neural head for classification.
```
Text → RADAR backbone (frozen, 355M) → 1024-dim embedding → adaptive-classifier head → human / ai
```
## Installation
```bash
pip install adaptive-classifier
```
## Usage
```python
from adaptive_classifier import AdaptiveClassifier
classifier = AdaptiveClassifier.from_pretrained("adaptive-classifier/ai-detector")
predictions = classifier.predict("Your text here")
# Returns: [('ai', 0.85), ('human', 0.15)]
# Batch prediction
results = classifier.predict_batch(["text 1", "text 2"], k=2)
# Continuous learning — add new examples without retraining
classifier.add_examples(
["new human text example", "new ai text example"],
["human", "ai"]
)
```
## Results
Evaluated on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) test splits.
### Binary Classification (Human vs AI)
| Model | Method | Test F1 |
|-------|--------|---------|
| EditLens Mistral-Small 24B | QLoRA fine-tuned | 95.6 |
| Pangram v2 | Proprietary | 83.7 |
| Binoculars | Perplexity ratio | 81.4 |
| FastDetectGPT | Log-prob based | 80.5 |
| **This model** | **Frozen RADAR + adaptive-classifier** | **72.1** |
### Per-Split Results
| Split | Accuracy | Macro-F1 | AI F1 | Human F1 |
|-------|----------|----------|-------|----------|
| test (in-distribution) | 73.5% | 72.1 | 78.3 | 65.9 |
| test_enron (OOD domain) | 73.5% | 64.1 | 82.5 | 45.7 |
| test_llama (OOD model) | 76.1% | 74.7 | 80.7 | 68.8 |
The model generalizes well to unseen AI models (Llama 3.3-70B), achieving higher F1 on OOD text than in-distribution.
## Training Details
- **Backbone**: [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (frozen, 355M params)
- **Dataset**: [pangram/editlens_iclr](https://huggingface.co/datasets/pangram/editlens_iclr) train split
- **Examples**: 1,000 per class (2,000 total), stratified sample
- **Classes**: `human` (human_written), `ai` (ai_edited + ai_generated)
- **Embedding dim**: 1024
- **Prototype weight**: 0.3, Neural weight: 0.7
- **Training time**: ~6 minutes on CPU
## Limitations
- Binary only (human vs AI) — does not distinguish AI-edited from AI-generated
- Relies on frozen RADAR embeddings; cannot learn new text patterns beyond what RADAR captures
- Minimum ~50 words of text recommended for reliable detection
- Trained on English text from specific domains (reviews, news, creative writing, academic)
## Citation
```bibtex
@software{adaptive_classifier,
title = {Adaptive Classifier: Dynamic Text Classification with Continuous Learning},
author = {Sharma, Asankhaya},
year = {2025},
publisher = {GitHub},
url = {https://github.com/codelion/adaptive-classifier}
}
```
|