Remove third-party citations, keep only adaptive-classifier

4983523 verified 2 days ago

3.94 kB

	---
	language: en
	tags:
	- adaptive-classifier
	- text-classification
	- ai-detection
	- ai-generated-text
	- continuous-learning
	license: apache-2.0
	datasets:
	- pangram/editlens_iclr
	base_model: TrustSafeAI/RADAR-Vicuna-7B
	metrics:
	- accuracy
	- f1
	pipeline_tag: text-classification
	model-index:
	- name: adaptive-classifier/ai-detector
	results:
	- task:
	type: text-classification
	name: AI Text Detection (Binary)
	dataset:
	name: EditLens ICLR 2026
	type: pangram/editlens_iclr
	split: test
	metrics:
	- type: accuracy
	value: 73.5
	name: Accuracy
	- type: f1
	value: 72.1
	name: Macro F1
	---

	# AI Text Detector (adaptive-classifier)

	A binary AI text detector that classifies text as human-written or AI-generated/edited, built with [adaptive-classifier](https://github.com/codelion/adaptive-classifier) on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) benchmark.

	## How It Works

	Uses frozen embeddings from [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (a RoBERTa-large model adversarially trained for AI detection) as a feature extractor, with adaptive-classifier's prototype memory + neural head for classification.

	```
	Text → RADAR backbone (frozen, 355M) → 1024-dim embedding → adaptive-classifier head → human / ai
	```

	## Installation

	```bash
	pip install adaptive-classifier
	```

	## Usage

	```python
	from adaptive_classifier import AdaptiveClassifier

	classifier = AdaptiveClassifier.from_pretrained("adaptive-classifier/ai-detector")

	predictions = classifier.predict("Your text here")
	# Returns: [('ai', 0.85), ('human', 0.15)]

	# Batch prediction
	results = classifier.predict_batch(["text 1", "text 2"], k=2)

	# Continuous learning — add new examples without retraining
	classifier.add_examples(
	["new human text example", "new ai text example"],
	["human", "ai"]
	)
	```

	## Results

	Evaluated on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) test splits.

	### Binary Classification (Human vs AI)

	\| Model \| Method \| Test F1 \|
	\|-------\|--------\|---------\|
	\| EditLens Mistral-Small 24B \| QLoRA fine-tuned \| 95.6 \|
	\| Pangram v2 \| Proprietary \| 83.7 \|
	\| Binoculars \| Perplexity ratio \| 81.4 \|
	\| FastDetectGPT \| Log-prob based \| 80.5 \|
	\| This model \| Frozen RADAR + adaptive-classifier \| 72.1 \|

	### Per-Split Results

	\| Split \| Accuracy \| Macro-F1 \| AI F1 \| Human F1 \|
	\|-------\|----------\|----------\|-------\|----------\|
	\| test (in-distribution) \| 73.5% \| 72.1 \| 78.3 \| 65.9 \|
	\| test_enron (OOD domain) \| 73.5% \| 64.1 \| 82.5 \| 45.7 \|
	\| test_llama (OOD model) \| 76.1% \| 74.7 \| 80.7 \| 68.8 \|

	The model generalizes well to unseen AI models (Llama 3.3-70B), achieving higher F1 on OOD text than in-distribution.

	## Training Details

	- Backbone: [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (frozen, 355M params)
	- Dataset: [pangram/editlens_iclr](https://huggingface.co/datasets/pangram/editlens_iclr) train split
	- Examples: 1,000 per class (2,000 total), stratified sample
	- Classes: `human` (human_written), `ai` (ai_edited + ai_generated)
	- Embedding dim: 1024
	- Prototype weight: 0.3, Neural weight: 0.7
	- Training time: ~6 minutes on CPU

	## Limitations

	- Binary only (human vs AI) — does not distinguish AI-edited from AI-generated
	- Relies on frozen RADAR embeddings; cannot learn new text patterns beyond what RADAR captures
	- Minimum ~50 words of text recommended for reliable detection
	- Trained on English text from specific domains (reviews, news, creative writing, academic)

	## Citation

	```bibtex
	@software{adaptive_classifier,
	title = {Adaptive Classifier: Dynamic Text Classification with Continuous Learning},
	author = {Sharma, Asankhaya},
	year = {2025},
	publisher = {GitHub},
	url = {https://github.com/codelion/adaptive-classifier}
	}
	```