| --- |
| language: en |
| tags: |
| - adaptive-classifier |
| - text-classification |
| - ai-detection |
| - ai-generated-text |
| - continuous-learning |
| license: apache-2.0 |
| datasets: |
| - pangram/editlens_iclr |
| base_model: TrustSafeAI/RADAR-Vicuna-7B |
| metrics: |
| - accuracy |
| - f1 |
| pipeline_tag: text-classification |
| model-index: |
| - name: adaptive-classifier/ai-detector |
| results: |
| - task: |
| type: text-classification |
| name: AI Text Detection (Binary) |
| dataset: |
| name: EditLens ICLR 2026 |
| type: pangram/editlens_iclr |
| split: test |
| metrics: |
| - type: accuracy |
| value: 73.5 |
| name: Accuracy |
| - type: f1 |
| value: 72.1 |
| name: Macro F1 |
| --- |
| |
| # AI Text Detector (adaptive-classifier) |
|
|
| A binary AI text detector that classifies text as **human-written** or **AI-generated/edited**, built with [adaptive-classifier](https://github.com/codelion/adaptive-classifier) on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) benchmark. |
|
|
| ## How It Works |
|
|
| Uses frozen embeddings from [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (a RoBERTa-large model adversarially trained for AI detection) as a feature extractor, with adaptive-classifier's prototype memory + neural head for classification. |
|
|
| ``` |
| Text β RADAR backbone (frozen, 355M) β 1024-dim embedding β adaptive-classifier head β human / ai |
| ``` |
|
|
| ## Installation |
|
|
| ```bash |
| pip install adaptive-classifier |
| ``` |
|
|
| ## Usage |
|
|
| ```python |
| from adaptive_classifier import AdaptiveClassifier |
| |
| classifier = AdaptiveClassifier.from_pretrained("adaptive-classifier/ai-detector") |
| |
| predictions = classifier.predict("Your text here") |
| # Returns: [('ai', 0.85), ('human', 0.15)] |
| |
| # Batch prediction |
| results = classifier.predict_batch(["text 1", "text 2"], k=2) |
| |
| # Continuous learning β add new examples without retraining |
| classifier.add_examples( |
| ["new human text example", "new ai text example"], |
| ["human", "ai"] |
| ) |
| ``` |
|
|
| ## Results |
|
|
| Evaluated on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) test splits. |
|
|
| ### Binary Classification (Human vs AI) |
|
|
| | Model | Method | Test F1 | |
| |-------|--------|---------| |
| | EditLens Mistral-Small 24B | QLoRA fine-tuned | 95.6 | |
| | Pangram v2 | Proprietary | 83.7 | |
| | Binoculars | Perplexity ratio | 81.4 | |
| | FastDetectGPT | Log-prob based | 80.5 | |
| | **This model** | **Frozen RADAR + adaptive-classifier** | **72.1** | |
|
|
| ### Per-Split Results |
|
|
| | Split | Accuracy | Macro-F1 | AI F1 | Human F1 | |
| |-------|----------|----------|-------|----------| |
| | test (in-distribution) | 73.5% | 72.1 | 78.3 | 65.9 | |
| | test_enron (OOD domain) | 73.5% | 64.1 | 82.5 | 45.7 | |
| | test_llama (OOD model) | 76.1% | 74.7 | 80.7 | 68.8 | |
|
|
| The model generalizes well to unseen AI models (Llama 3.3-70B), achieving higher F1 on OOD text than in-distribution. |
|
|
| ## Training Details |
|
|
| - **Backbone**: [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (frozen, 355M params) |
| - **Dataset**: [pangram/editlens_iclr](https://huggingface.co/datasets/pangram/editlens_iclr) train split |
| - **Examples**: 1,000 per class (2,000 total), stratified sample |
| - **Classes**: `human` (human_written), `ai` (ai_edited + ai_generated) |
| - **Embedding dim**: 1024 |
| - **Prototype weight**: 0.3, Neural weight: 0.7 |
| - **Training time**: ~6 minutes on CPU |
| |
| ## Limitations |
| |
| - Binary only (human vs AI) β does not distinguish AI-edited from AI-generated |
| - Relies on frozen RADAR embeddings; cannot learn new text patterns beyond what RADAR captures |
| - Minimum ~50 words of text recommended for reliable detection |
| - Trained on English text from specific domains (reviews, news, creative writing, academic) |
| |
| ## Citation |
| |
| ```bibtex |
| @software{adaptive_classifier, |
| title = {Adaptive Classifier: Dynamic Text Classification with Continuous Learning}, |
| author = {Sharma, Asankhaya}, |
| year = {2025}, |
| publisher = {GitHub}, |
| url = {https://github.com/codelion/adaptive-classifier} |
| } |
| ``` |
| |