codelion commited on
Commit
ff9f726
·
verified ·
1 Parent(s): 7e4fb88

Update model card with feedback_only retrain metrics

Browse files
Files changed (1) hide show
  1. README.md +85 -38
README.md CHANGED
@@ -1,74 +1,121 @@
1
  ---
2
- language: multilingual
3
  tags:
4
  - adaptive-classifier
5
  - text-classification
 
 
6
  - continuous-learning
7
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- # Adaptive Classifier
11
 
12
- This model is an instance of an [adaptive-classifier](https://github.com/codelion/adaptive-classifier) that allows for continuous learning and dynamic class addition.
13
 
14
- ## Installation
15
 
16
- **IMPORTANT:** To use this model, you must first install the `adaptive-classifier` library. You do **NOT** need `trust_remote_code=True`.
 
 
 
 
 
 
17
 
18
  ```bash
19
  pip install adaptive-classifier
20
  ```
21
 
22
- ## Model Details
 
 
 
 
 
23
 
24
- - Base Model: TrustSafeAI/RADAR-Vicuna-7B
25
- - Number of Classes: 2
26
- - Total Examples: 400
27
- - Embedding Dimension: 1024
28
 
29
- ## Class Distribution
 
30
 
31
- ```
32
- ai: 200 examples (50.0%)
33
- human: 200 examples (50.0%)
 
 
34
  ```
35
 
36
- ## Usage
37
 
38
- After installing the `adaptive-classifier` library, you can load and use this model:
39
 
40
- ```python
41
- from adaptive_classifier import AdaptiveClassifier
42
 
43
- # Load the model (no trust_remote_code needed!)
44
- classifier = AdaptiveClassifier.from_pretrained("adaptive-classifier/model-name")
 
 
 
 
 
45
 
46
- # Make predictions
47
- text = "Your text here"
48
- predictions = classifier.predict(text)
49
- print(predictions) # List of (label, confidence) tuples
50
 
51
- # Add new examples for continuous learning
52
- texts = ["Example 1", "Example 2"]
53
- labels = ["class1", "class2"]
54
- classifier.add_examples(texts, labels)
55
- ```
56
 
57
- **Note:** This model uses the `adaptive-classifier` library distributed via PyPI. You do **NOT** need to set `trust_remote_code=True` - just install the library first.
58
 
59
  ## Training Details
60
 
61
- - Training Steps: 6
62
- - Examples per Class: See distribution above
63
- - Prototype Memory: Active
64
- - Neural Adaptation: Active
 
 
 
 
 
 
 
65
 
66
  ## Limitations
67
 
68
- This model:
69
- - Requires at least 3 examples per class
70
- - Has a maximum of 1000 examples per class
71
- - Updates prototypes every 100 examples
72
 
73
  ## Citation
74
 
 
1
  ---
2
+ language: en
3
  tags:
4
  - adaptive-classifier
5
  - text-classification
6
+ - ai-detection
7
+ - ai-generated-text
8
  - continuous-learning
9
  license: apache-2.0
10
+ datasets:
11
+ - pangram/editlens_iclr
12
+ - adaptive-classifier/ai-detector-data
13
+ base_model: TrustSafeAI/RADAR-Vicuna-7B
14
+ metrics:
15
+ - accuracy
16
+ - f1
17
+ pipeline_tag: text-classification
18
+ model-index:
19
+ - name: adaptive-classifier/ai-detector
20
+ results:
21
+ - task:
22
+ type: text-classification
23
+ name: AI Text Detection (Binary)
24
+ dataset:
25
+ name: EditLens ICLR 2026
26
+ type: pangram/editlens_iclr
27
+ split: test
28
+ metrics:
29
+ - type: accuracy
30
+ value: 74.2
31
+ name: Accuracy
32
+ - type: f1
33
+ value: 73.7
34
+ name: Macro F1
35
  ---
36
 
37
+ # AI Text Detector (adaptive-classifier)
38
 
39
+ A binary AI text detector that classifies text as **human-written** or **AI-generated/edited**, built with [adaptive-classifier](https://github.com/codelion/adaptive-classifier) on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) benchmark.
40
 
41
+ ## How It Works
42
 
43
+ Uses frozen embeddings from [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (a RoBERTa-large model adversarially trained for AI detection) as a feature extractor, with adaptive-classifier's prototype memory + neural head for classification.
44
+
45
+ ```
46
+ Text → RADAR backbone (frozen, 355M) → 1024-dim embedding → adaptive-classifier head → human / ai
47
+ ```
48
+
49
+ ## Installation
50
 
51
  ```bash
52
  pip install adaptive-classifier
53
  ```
54
 
55
+ ## Usage
56
+
57
+ ```python
58
+ from adaptive_classifier import AdaptiveClassifier
59
+
60
+ classifier = AdaptiveClassifier.from_pretrained("adaptive-classifier/ai-detector")
61
 
62
+ predictions = classifier.predict("Your text here")
63
+ # Returns: [('ai', 0.85), ('human', 0.15)]
 
 
64
 
65
+ # Batch prediction
66
+ results = classifier.predict_batch(["text 1", "text 2"], k=2)
67
 
68
+ # Continuous learning — add new examples without retraining
69
+ classifier.add_examples(
70
+ ["new human text example", "new ai text example"],
71
+ ["human", "ai"]
72
+ )
73
  ```
74
 
75
+ ## Results
76
 
77
+ Evaluated on the [EditLens ICLR 2026](https://huggingface.co/datasets/pangram/editlens_iclr) test splits.
78
 
79
+ ### Binary Classification (Human vs AI)
 
80
 
81
+ | Model | Method | Test F1 |
82
+ |-------|--------|---------|
83
+ | EditLens Mistral-Small 24B | QLoRA fine-tuned | 95.6 |
84
+ | Pangram v2 | Proprietary | 83.7 |
85
+ | Binoculars | Perplexity ratio | 81.4 |
86
+ | FastDetectGPT | Log-prob based | 80.5 |
87
+ | **This model** | **Frozen RADAR + adaptive-classifier** | **73.7** |
88
 
89
+ ### Per-Split Results
 
 
 
90
 
91
+ | Split | Accuracy | Macro-F1 | AI F1 | Human F1 |
92
+ |-------|----------|----------|-------|----------|
93
+ | test (in-distribution) | 74.2% | 73.7 | 77.5 | 69.9 |
94
+ | test_enron (OOD domain) | 79.1% | 75.2 | 85.0 | 65.3 |
95
+ | test_llama (OOD model) | 74.3% | 73.8 | 77.2 | 70.4 |
96
 
97
+ The model generalizes well to OOD splits: accuracy on emails (test_enron) and unseen AI models (Llama 3.3-70B / test_llama) is on par with or above the in-distribution test set.
98
 
99
  ## Training Details
100
 
101
+ - **Backbone**: [TrustSafeAI/RADAR-Vicuna-7B](https://huggingface.co/TrustSafeAI/RADAR-Vicuna-7B) (frozen, 355M params)
102
+ - **Dataset**: [pangram/editlens_iclr](https://huggingface.co/datasets/pangram/editlens_iclr) train split
103
+ - **Examples**: 1,000 per class (2,000 total), stratified sample
104
+ - **Classes**: `human` (human_written), `ai` (ai_edited + ai_generated)
105
+ - **Embedding dim**: 1024
106
+ - **Prototype weight**: 0.3, Neural weight: 0.7
107
+ - **Training time**: ~6 minutes on CPU
108
+
109
+ ## Live Predictions Dataset
110
+
111
+ Predictions made through the [hosted Space](https://huggingface.co/spaces/adaptive-classifier/ai-detector) are continuously logged to [adaptive-classifier/ai-detector-data](https://huggingface.co/datasets/adaptive-classifier/ai-detector-data) — a public dataset of real-world predictions with optional user feedback (Correct / Incorrect). This dataset grows over time and can be used to track model performance, find failure cases, and drive future retraining.
112
 
113
  ## Limitations
114
 
115
+ - Binary only (human vs AI) — does not distinguish AI-edited from AI-generated
116
+ - Relies on frozen RADAR embeddings; cannot learn new text patterns beyond what RADAR captures
117
+ - Minimum ~50 words of text recommended for reliable detection
118
+ - Trained on English text from specific domains (reviews, news, creative writing, academic)
119
 
120
  ## Citation
121