File size: 3,941 Bytes

---
language: en
tags:
- adaptive-classifier
- text-classification
- ai-detection
- ai-generated-text
- continuous-learning
license: apache-2.0
datasets:
- pangram/editlens_iclr
base_model: TrustSafeAI/RADAR-Vicuna-7B
metrics:
- accuracy
- f1
pipeline_tag: text-classification
model-index:
- name: adaptive-classifier/ai-detector
  results:
  - task:
      type: text-classification
      name: AI Text Detection (Binary)
    dataset:
      name: EditLens ICLR 2026
      type: pangram/editlens_iclr
      split: test
    metrics:
    - type: accuracy
      value: 73.5
      name: Accuracy
    - type: f1
      value: 72.1
      name: Macro F1
---

# AI Text Detector (adaptive-classifier)

A binary AI text detector that classifies text as **human-written** or **AI-generated/edited**, built with [adaptive-classifier](https://github.com/codelion/adaptive-classifier) on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) benchmark.

## How It Works

Uses frozen embeddings from [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (a RoBERTa-large model adversarially trained for AI detection) as a feature extractor, with adaptive-classifier's prototype memory + neural head for classification.

```
Text → RADAR backbone (frozen, 355M) → 1024-dim embedding → adaptive-classifier head → human / ai
```

## Installation

```bash
pip install adaptive-classifier
```

## Usage

```python
from adaptive_classifier import AdaptiveClassifier

classifier = AdaptiveClassifier.from_pretrained("adaptive-classifier/ai-detector")

predictions = classifier.predict("Your text here")
# Returns: [('ai', 0.85), ('human', 0.15)]

# Batch prediction
results = classifier.predict_batch(["text 1", "text 2"], k=2)

# Continuous learning — add new examples without retraining
classifier.add_examples(
    ["new human text example", "new ai text example"],
    ["human", "ai"]
)
```

## Results

Evaluated on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) test splits.

### Binary Classification (Human vs AI)

| Model | Method | Test F1 |
|-------|--------|---------|
| EditLens Mistral-Small 24B | QLoRA fine-tuned | 95.6 |
| Pangram v2 | Proprietary | 83.7 |
| Binoculars | Perplexity ratio | 81.4 |
| FastDetectGPT | Log-prob based | 80.5 |
| **This model** | **Frozen RADAR + adaptive-classifier** | **72.1** |

### Per-Split Results

| Split | Accuracy | Macro-F1 | AI F1 | Human F1 |
|-------|----------|----------|-------|----------|
| test (in-distribution) | 73.5% | 72.1 | 78.3 | 65.9 |
| test_enron (OOD domain) | 73.5% | 64.1 | 82.5 | 45.7 |
| test_llama (OOD model) | 76.1% | 74.7 | 80.7 | 68.8 |

The model generalizes well to unseen AI models (Llama 3.3-70B), achieving higher F1 on OOD text than in-distribution.

## Training Details

- **Backbone**: [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (frozen, 355M params)
- **Dataset**: [pangram/editlens_iclr](https://huggingface.co/datasets/pangram/editlens_iclr) train split
- **Examples**: 1,000 per class (2,000 total), stratified sample
- **Classes**: `human` (human_written), `ai` (ai_edited + ai_generated)
- **Embedding dim**: 1024
- **Prototype weight**: 0.3, Neural weight: 0.7
- **Training time**: ~6 minutes on CPU

## Limitations

- Binary only (human vs AI) — does not distinguish AI-edited from AI-generated
- Relies on frozen RADAR embeddings; cannot learn new text patterns beyond what RADAR captures
- Minimum ~50 words of text recommended for reliable detection
- Trained on English text from specific domains (reviews, news, creative writing, academic)

## Citation

```bibtex
@software{adaptive_classifier,
  title = {Adaptive Classifier: Dynamic Text Classification with Continuous Learning},
  author = {Sharma, Asankhaya},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/codelion/adaptive-classifier}
}
```