--- language: en tags: - adaptive-classifier - text-classification - ai-detection - ai-generated-text - continuous-learning license: apache-2.0 datasets: - pangram/editlens_iclr base_model: TrustSafeAI/RADAR-Vicuna-7B metrics: - accuracy - f1 pipeline_tag: text-classification model-index: - name: adaptive-classifier/ai-detector results: - task: type: text-classification name: AI Text Detection (Binary) dataset: name: EditLens ICLR 2026 type: pangram/editlens_iclr split: test metrics: - type: accuracy value: 73.5 name: Accuracy - type: f1 value: 72.1 name: Macro F1 --- # AI Text Detector (adaptive-classifier) A binary AI text detector that classifies text as **human-written** or **AI-generated/edited**, built with [adaptive-classifier](https://github.com/codelion/adaptive-classifier) on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) benchmark. ## How It Works Uses frozen embeddings from [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (a RoBERTa-large model adversarially trained for AI detection) as a feature extractor, with adaptive-classifier's prototype memory + neural head for classification. ``` Text → RADAR backbone (frozen, 355M) → 1024-dim embedding → adaptive-classifier head → human / ai ``` ## Installation ```bash pip install adaptive-classifier ``` ## Usage ```python from adaptive_classifier import AdaptiveClassifier classifier = AdaptiveClassifier.from_pretrained("adaptive-classifier/ai-detector") predictions = classifier.predict("Your text here") # Returns: [('ai', 0.85), ('human', 0.15)] # Batch prediction results = classifier.predict_batch(["text 1", "text 2"], k=2) # Continuous learning — add new examples without retraining classifier.add_examples( ["new human text example", "new ai text example"], ["human", "ai"] ) ``` ## Results Evaluated on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) test splits. ### Binary Classification (Human vs AI) | Model | Method | Test F1 | |-------|--------|---------| | EditLens Mistral-Small 24B | QLoRA fine-tuned | 95.6 | | Pangram v2 | Proprietary | 83.7 | | Binoculars | Perplexity ratio | 81.4 | | FastDetectGPT | Log-prob based | 80.5 | | **This model** | **Frozen RADAR + adaptive-classifier** | **72.1** | ### Per-Split Results | Split | Accuracy | Macro-F1 | AI F1 | Human F1 | |-------|----------|----------|-------|----------| | test (in-distribution) | 73.5% | 72.1 | 78.3 | 65.9 | | test_enron (OOD domain) | 73.5% | 64.1 | 82.5 | 45.7 | | test_llama (OOD model) | 76.1% | 74.7 | 80.7 | 68.8 | The model generalizes well to unseen AI models (Llama 3.3-70B), achieving higher F1 on OOD text than in-distribution. ## Training Details - **Backbone**: [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (frozen, 355M params) - **Dataset**: [pangram/editlens_iclr](https://huggingface.co/datasets/pangram/editlens_iclr) train split - **Examples**: 1,000 per class (2,000 total), stratified sample - **Classes**: `human` (human_written), `ai` (ai_edited + ai_generated) - **Embedding dim**: 1024 - **Prototype weight**: 0.3, Neural weight: 0.7 - **Training time**: ~6 minutes on CPU ## Limitations - Binary only (human vs AI) — does not distinguish AI-edited from AI-generated - Relies on frozen RADAR embeddings; cannot learn new text patterns beyond what RADAR captures - Minimum ~50 words of text recommended for reliable detection - Trained on English text from specific domains (reviews, news, creative writing, academic) ## Citation ```bibtex @software{adaptive_classifier, title = {Adaptive Classifier: Dynamic Text Classification with Continuous Learning}, author = {Sharma, Asankhaya}, year = {2025}, publisher = {GitHub}, url = {https://github.com/codelion/adaptive-classifier} } ```