File size: 3,941 Bytes
84f7397
ea13f99
84f7397
 
 
ea13f99
 
84f7397
 
ea13f99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84f7397
 
ea13f99
84f7397
ea13f99
84f7397
ea13f99
 
 
 
 
 
 
84f7397
ea13f99
84f7397
 
 
 
 
ea13f99
84f7397
ea13f99
 
84f7397
ea13f99
84f7397
ea13f99
 
 
 
 
 
 
 
 
 
 
84f7397
 
ea13f99
84f7397
ea13f99
84f7397
ea13f99
84f7397
ea13f99
 
 
 
 
 
 
84f7397
ea13f99
84f7397
ea13f99
 
 
 
 
84f7397
ea13f99
84f7397
 
 
ea13f99
 
 
 
 
 
 
84f7397
 
 
ea13f99
 
 
 
84f7397
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
language: en
tags:
- adaptive-classifier
- text-classification
- ai-detection
- ai-generated-text
- continuous-learning
license: apache-2.0
datasets:
- pangram/editlens_iclr
base_model: TrustSafeAI/RADAR-Vicuna-7B
metrics:
- accuracy
- f1
pipeline_tag: text-classification
model-index:
- name: adaptive-classifier/ai-detector
  results:
  - task:
      type: text-classification
      name: AI Text Detection (Binary)
    dataset:
      name: EditLens ICLR 2026
      type: pangram/editlens_iclr
      split: test
    metrics:
    - type: accuracy
      value: 73.5
      name: Accuracy
    - type: f1
      value: 72.1
      name: Macro F1
---

# AI Text Detector (adaptive-classifier)

A binary AI text detector that classifies text as **human-written** or **AI-generated/edited**, built with [adaptive-classifier](https://github.com/codelion/adaptive-classifier) on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) benchmark.

## How It Works

Uses frozen embeddings from [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (a RoBERTa-large model adversarially trained for AI detection) as a feature extractor, with adaptive-classifier's prototype memory + neural head for classification.

```
Text → RADAR backbone (frozen, 355M) → 1024-dim embedding → adaptive-classifier head → human / ai
```

## Installation

```bash
pip install adaptive-classifier
```

## Usage

```python
from adaptive_classifier import AdaptiveClassifier

classifier = AdaptiveClassifier.from_pretrained("adaptive-classifier/ai-detector")

predictions = classifier.predict("Your text here")
# Returns: [('ai', 0.85), ('human', 0.15)]

# Batch prediction
results = classifier.predict_batch(["text 1", "text 2"], k=2)

# Continuous learning — add new examples without retraining
classifier.add_examples(
    ["new human text example", "new ai text example"],
    ["human", "ai"]
)
```

## Results

Evaluated on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) test splits.

### Binary Classification (Human vs AI)

| Model | Method | Test F1 |
|-------|--------|---------|
| EditLens Mistral-Small 24B | QLoRA fine-tuned | 95.6 |
| Pangram v2 | Proprietary | 83.7 |
| Binoculars | Perplexity ratio | 81.4 |
| FastDetectGPT | Log-prob based | 80.5 |
| **This model** | **Frozen RADAR + adaptive-classifier** | **72.1** |

### Per-Split Results

| Split | Accuracy | Macro-F1 | AI F1 | Human F1 |
|-------|----------|----------|-------|----------|
| test (in-distribution) | 73.5% | 72.1 | 78.3 | 65.9 |
| test_enron (OOD domain) | 73.5% | 64.1 | 82.5 | 45.7 |
| test_llama (OOD model) | 76.1% | 74.7 | 80.7 | 68.8 |

The model generalizes well to unseen AI models (Llama 3.3-70B), achieving higher F1 on OOD text than in-distribution.

## Training Details

- **Backbone**: [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (frozen, 355M params)
- **Dataset**: [pangram/editlens_iclr](https://huggingface.co/datasets/pangram/editlens_iclr) train split
- **Examples**: 1,000 per class (2,000 total), stratified sample
- **Classes**: `human` (human_written), `ai` (ai_edited + ai_generated)
- **Embedding dim**: 1024
- **Prototype weight**: 0.3, Neural weight: 0.7
- **Training time**: ~6 minutes on CPU

## Limitations

- Binary only (human vs AI) — does not distinguish AI-edited from AI-generated
- Relies on frozen RADAR embeddings; cannot learn new text patterns beyond what RADAR captures
- Minimum ~50 words of text recommended for reliable detection
- Trained on English text from specific domains (reviews, news, creative writing, academic)

## Citation

```bibtex
@software{adaptive_classifier,
  title = {Adaptive Classifier: Dynamic Text Classification with Continuous Learning},
  author = {Sharma, Asankhaya},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/codelion/adaptive-classifier}
}
```