codelion commited on
Commit
ea13f99
·
verified ·
1 Parent(s): 84f7397

Update model card with results, dataset link, and proper metadata

Browse files
Files changed (1) hide show
  1. README.md +94 -38
README.md CHANGED
@@ -1,74 +1,116 @@
1
  ---
2
- language: multilingual
3
  tags:
4
  - adaptive-classifier
5
  - text-classification
 
 
6
  - continuous-learning
7
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- # Adaptive Classifier
11
 
12
- This model is an instance of an [adaptive-classifier](https://github.com/codelion/adaptive-classifier) that allows for continuous learning and dynamic class addition.
13
 
14
- ## Installation
 
 
 
 
 
 
15
 
16
- **IMPORTANT:** To use this model, you must first install the `adaptive-classifier` library. You do **NOT** need `trust_remote_code=True`.
17
 
18
  ```bash
19
  pip install adaptive-classifier
20
  ```
21
 
22
- ## Model Details
23
 
24
- - Base Model: TrustSafeAI/RADAR-Vicuna-7B
25
- - Number of Classes: 2
26
- - Total Examples: 10
27
- - Embedding Dimension: 1024
28
 
29
- ## Class Distribution
30
 
31
- ```
32
- ai: 5 examples (50.0%)
33
- human: 5 examples (50.0%)
 
 
 
 
 
 
 
 
34
  ```
35
 
36
- ## Usage
37
 
38
- After installing the `adaptive-classifier` library, you can load and use this model:
39
 
40
- ```python
41
- from adaptive_classifier import AdaptiveClassifier
42
 
43
- # Load the model (no trust_remote_code needed!)
44
- classifier = AdaptiveClassifier.from_pretrained("adaptive-classifier/model-name")
 
 
 
 
 
45
 
46
- # Make predictions
47
- text = "Your text here"
48
- predictions = classifier.predict(text)
49
- print(predictions) # List of (label, confidence) tuples
50
 
51
- # Add new examples for continuous learning
52
- texts = ["Example 1", "Example 2"]
53
- labels = ["class1", "class2"]
54
- classifier.add_examples(texts, labels)
55
- ```
56
 
57
- **Note:** This model uses the `adaptive-classifier` library distributed via PyPI. You do **NOT** need to set `trust_remote_code=True` - just install the library first.
58
 
59
  ## Training Details
60
 
61
- - Training Steps: 4
62
- - Examples per Class: See distribution above
63
- - Prototype Memory: Active
64
- - Neural Adaptation: Active
 
 
 
65
 
66
  ## Limitations
67
 
68
- This model:
69
- - Requires at least 3 examples per class
70
- - Has a maximum of 1000 examples per class
71
- - Updates prototypes every 100 examples
72
 
73
  ## Citation
74
 
@@ -80,4 +122,18 @@ This model:
80
  publisher = {GitHub},
81
  url = {https://github.com/codelion/adaptive-classifier}
82
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
  ```
 
1
  ---
2
+ language: en
3
  tags:
4
  - adaptive-classifier
5
  - text-classification
6
+ - ai-detection
7
+ - ai-generated-text
8
  - continuous-learning
9
  license: apache-2.0
10
+ datasets:
11
+ - pangram/editlens_iclr
12
+ base_model: TrustSafeAI/RADAR-Vicuna-7B
13
+ metrics:
14
+ - accuracy
15
+ - f1
16
+ pipeline_tag: text-classification
17
+ model-index:
18
+ - name: adaptive-classifier/ai-detector
19
+ results:
20
+ - task:
21
+ type: text-classification
22
+ name: AI Text Detection (Binary)
23
+ dataset:
24
+ name: EditLens ICLR 2026
25
+ type: pangram/editlens_iclr
26
+ split: test
27
+ metrics:
28
+ - type: accuracy
29
+ value: 73.5
30
+ name: Accuracy
31
+ - type: f1
32
+ value: 72.1
33
+ name: Macro F1
34
  ---
35
 
36
+ # AI Text Detector (adaptive-classifier)
37
 
38
+ A binary AI text detector that classifies text as **human-written** or **AI-generated/edited**, built with [adaptive-classifier](https://github.com/codelion/adaptive-classifier) on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) benchmark.
39
 
40
+ ## How It Works
41
+
42
+ Uses frozen embeddings from [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (a RoBERTa-large model adversarially trained for AI detection) as a feature extractor, with adaptive-classifier's prototype memory + neural head for classification.
43
+
44
+ ```
45
+ Text → RADAR backbone (frozen, 355M) → 1024-dim embedding → adaptive-classifier head → human / ai
46
+ ```
47
 
48
+ ## Installation
49
 
50
  ```bash
51
  pip install adaptive-classifier
52
  ```
53
 
54
+ ## Usage
55
 
56
+ ```python
57
+ from adaptive_classifier import AdaptiveClassifier
 
 
58
 
59
+ classifier = AdaptiveClassifier.from_pretrained("adaptive-classifier/ai-detector")
60
 
61
+ predictions = classifier.predict("Your text here")
62
+ # Returns: [('ai', 0.85), ('human', 0.15)]
63
+
64
+ # Batch prediction
65
+ results = classifier.predict_batch(["text 1", "text 2"], k=2)
66
+
67
+ # Continuous learning — add new examples without retraining
68
+ classifier.add_examples(
69
+ ["new human text example", "new ai text example"],
70
+ ["human", "ai"]
71
+ )
72
  ```
73
 
74
+ ## Results
75
 
76
+ Evaluated on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) test splits.
77
 
78
+ ### Binary Classification (Human vs AI)
 
79
 
80
+ | Model | Method | Test F1 |
81
+ |-------|--------|---------|
82
+ | EditLens Mistral-Small 24B | QLoRA fine-tuned | 95.6 |
83
+ | Pangram v2 | Proprietary | 83.7 |
84
+ | Binoculars | Perplexity ratio | 81.4 |
85
+ | FastDetectGPT | Log-prob based | 80.5 |
86
+ | **This model** | **Frozen RADAR + adaptive-classifier** | **72.1** |
87
 
88
+ ### Per-Split Results
 
 
 
89
 
90
+ | Split | Accuracy | Macro-F1 | AI F1 | Human F1 |
91
+ |-------|----------|----------|-------|----------|
92
+ | test (in-distribution) | 73.5% | 72.1 | 78.3 | 65.9 |
93
+ | test_enron (OOD domain) | 73.5% | 64.1 | 82.5 | 45.7 |
94
+ | test_llama (OOD model) | 76.1% | 74.7 | 80.7 | 68.8 |
95
 
96
+ The model generalizes well to unseen AI models (Llama 3.3-70B), achieving higher F1 on OOD text than in-distribution.
97
 
98
  ## Training Details
99
 
100
+ - **Backbone**: [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (frozen, 355M params)
101
+ - **Dataset**: [pangram/editlens_iclr](https://huggingface.co/datasets/pangram/editlens_iclr) train split
102
+ - **Examples**: 1,000 per class (2,000 total), stratified sample
103
+ - **Classes**: `human` (human_written), `ai` (ai_edited + ai_generated)
104
+ - **Embedding dim**: 1024
105
+ - **Prototype weight**: 0.3, Neural weight: 0.7
106
+ - **Training time**: ~6 minutes on CPU
107
 
108
  ## Limitations
109
 
110
+ - Binary only (human vs AI) — does not distinguish AI-edited from AI-generated
111
+ - Relies on frozen RADAR embeddings; cannot learn new text patterns beyond what RADAR captures
112
+ - Minimum ~50 words of text recommended for reliable detection
113
+ - Trained on English text from specific domains (reviews, news, creative writing, academic)
114
 
115
  ## Citation
116
 
 
122
  publisher = {GitHub},
123
  url = {https://github.com/codelion/adaptive-classifier}
124
  }
125
+
126
+ @inproceedings{thai2026editlens,
127
+ title = {EditLens: Quantifying the Extent of AI Editing in Text},
128
+ author = {Thai, Katherine and Emi, Bradley and Masrour, Elyas and Iyyer, Mohit},
129
+ booktitle = {ICLR},
130
+ year = {2026}
131
+ }
132
+
133
+ @article{hu2023radar,
134
+ title = {RADAR: Robust AI-Text Detection via Adversarial Learning},
135
+ author = {Hu, Xiaomeng and Chen, Pin-Yu and Ho, Tsung-Yi},
136
+ journal = {arXiv preprint arXiv:2307.03838},
137
+ year = {2023}
138
+ }
139
  ```