SamanthaStorm commited on
Commit
219d96c
·
verified ·
1 Parent(s): 0bc2822

Upload IntentAnalyzer v1.0 - Multi-Label Communication Intent Detection Model

Browse files
README.md ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - text-classification
6
+ - intent-detection
7
+ - communication-analysis
8
+ - multi-label-classification
9
+ - psychology
10
+ - nlp
11
+ - transformers
12
+ datasets:
13
+ - custom
14
+ metrics:
15
+ - f1: 0.77
16
+ - precision: 0.95
17
+ - recall: 0.92
18
+ model-index:
19
+ - name: IntentAnalyzer
20
+ results:
21
+ - task:
22
+ type: text-classification
23
+ name: Multi-Label Intent Detection
24
+ dataset:
25
+ type: custom
26
+ name: Communication Intent Dataset
27
+ metrics:
28
+ - type: f1_macro
29
+ value: 0.77
30
+ - type: f1_trolling
31
+ value: 0.94
32
+ - type: f1_constructive
33
+ value: 0.99
34
+ widget:
35
+ - text: "You're just a stupid liberal, so your opinion doesn't matter"
36
+ example_title: "Manipulative + Dismissive"
37
+ - text: "I understand your concerns, but here's why I disagree"
38
+ example_title: "Constructive Communication"
39
+ - text: "Whatever, I don't care about this anymore"
40
+ example_title: "Dismissive Behavior"
41
+ - text: "I CAN'T BELIEVE you would say that to me!!!"
42
+ example_title: "Emotionally Reactive"
43
+ - text: "If you really loved me, you would support this"
44
+ example_title: "Manipulative Behavior"
45
+ ---
46
+
47
+ # IntentAnalyzer: Multi-Label Communication Intent Detection
48
+
49
+ ## Model Description
50
+
51
+ IntentAnalyzer is a state-of-the-art multi-label text classification model designed to detect underlying intentions in human communication. Built on DistilBERT architecture, this model can simultaneously identify multiple intent categories with high precision, helping understand the psychological and communicative patterns behind text.
52
+
53
+ ## Supported Intent Categories
54
+
55
+ The model detects 6 different intent categories (multi-label):
56
+
57
+ 1. **🧌 Trolling** - Deliberately provocative or disruptive communication
58
+ 2. **🚫 Dismissive** - Shutting down conversation or avoiding engagement
59
+ 3. **🎭 Manipulative** - Using emotional coercion, guilt, or pressure tactics
60
+ 4. **🌋 Emotionally Reactive** - Overwhelmed by emotion, not thinking clearly
61
+ 5. **✅ Constructive** - Good faith engagement and dialogue
62
+ 6. **❓ Unclear** - Ambiguous intent that's difficult to determine
63
+
64
+ ## Performance Metrics
65
+
66
+ ### Overall Performance
67
+ - **F1 Score (Macro)**: 0.77
68
+ - **Multi-label Classification**: Supports simultaneous detection of multiple intents
69
+
70
+ ### Per-Category Performance
71
+ - **Trolling**: F1=0.943 (P=0.976, R=0.911)
72
+ - **Dismissive**: F1=0.850 (P=0.964, R=0.761)
73
+ - **Manipulative**: F1=0.907 (P=0.867, R=0.951)
74
+ - **Emotionally Reactive**: F1=0.939 (P=0.931, R=0.947)
75
+ - **Constructive**: F1=0.989 (P=0.978, R=1.000)
76
+ - **Unclear**: F1=0.000 (Expected - ambiguous by design)
77
+
78
+ ## Usage
79
+
80
+ ```python
81
+ import torch
82
+ from transformers import AutoTokenizer, AutoModel
83
+ import torch.nn as nn
84
+
85
+ # Define the model architecture
86
+ class MultiLabelIntentClassifier(nn.Module):
87
+ def __init__(self, model_name, num_labels):
88
+ super().__init__()
89
+ self.bert = AutoModel.from_pretrained(model_name)
90
+ self.dropout = nn.Dropout(0.3)
91
+ self.classifier = nn.Linear(self.bert.config.hidden_size, num_labels)
92
+
93
+ def forward(self, input_ids, attention_mask):
94
+ outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
95
+ pooled_output = outputs.last_hidden_state[:, 0]
96
+ pooled_output = self.dropout(pooled_output)
97
+ logits = self.classifier(pooled_output)
98
+ return logits
99
+
100
+ # Load model and tokenizer
101
+ model_name = "SamanthaStorm/intentanalyzer"
102
+ tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
103
+
104
+ # Load the custom model (you'll need to download the .pth file)
105
+ model = MultiLabelIntentClassifier("distilbert-base-uncased", 6)
106
+ # model.load_state_dict(torch.load('pytorch_model.bin'))
107
+
108
+ # Intent categories
109
+ intent_categories = ['trolling', 'dismissive', 'manipulative', 'emotionally_reactive', 'constructive', 'unclear']
110
+
111
+ def predict_intent(text, threshold=0.5):
112
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
113
+
114
+ with torch.no_grad():
115
+ outputs = model(inputs['input_ids'], inputs['attention_mask'])
116
+ probabilities = torch.sigmoid(outputs).numpy()[0]
117
+
118
+ # Return predictions above threshold
119
+ predictions = {}
120
+ for i, category in enumerate(intent_categories):
121
+ prob = probabilities[i]
122
+ if prob > threshold:
123
+ predictions[category] = prob
124
+
125
+ return predictions
126
+
127
+ # Example usage
128
+ text = "You're just being emotional and can't think rationally"
129
+ intents = predict_intent(text)
130
+ print("Detected intents:", intents)
131
+ ```
132
+
133
+ ## Training Data
134
+
135
+ The model was trained on a carefully curated dataset of 1,226 examples with:
136
+ - **Single-label examples**: Clear instances of each intent type
137
+ - **Multi-label examples**: Realistic scenarios with multiple simultaneous intents
138
+ - **Balanced distribution**: Proper representation across all categories
139
+ - **Diverse contexts**: Personal, professional, online, and social interactions
140
+
141
+ ## Model Architecture
142
+
143
+ - **Base Model**: DistilBERT (distilbert-base-uncased)
144
+ - **Task**: Multi-label text classification
145
+ - **Classes**: 6 intent categories
146
+ - **Loss Function**: BCEWithLogitsLoss (binary cross-entropy for multi-label)
147
+ - **Max Sequence Length**: 128 tokens
148
+ - **Training Examples**: 1,226 high-quality examples
149
+
150
+ ## Applications
151
+
152
+ ### Communication Analysis
153
+ - **Customer Service**: Identify frustrated or manipulative customers
154
+ - **Social Media Monitoring**: Detect trolling and constructive engagement
155
+ - **Relationship Counseling**: Understand communication patterns
156
+ - **Content Moderation**: Flag problematic intent patterns
157
+
158
+ ### Research Applications
159
+ - **Psychology**: Study communication patterns and intentions
160
+ - **Linguistics**: Analyze pragmatic aspects of language
161
+ - **Social Sciences**: Understanding online discourse patterns
162
+ - **Education**: Teaching healthy communication skills
163
+
164
+ ## Limitations and Considerations
165
+
166
+ - Trained primarily on English text
167
+ - Performance may vary on highly context-dependent cases
168
+ - Best suited for interpersonal communication analysis
169
+ - Cultural and contextual nuances may affect accuracy
170
+ - Multi-label predictions require threshold tuning for optimal results
171
+
172
+ ## Model Card Contact
173
+
174
+ For questions, issues, or collaboration opportunities, please open an issue on the model repository.
175
+
176
+ ## Ethical Considerations
177
+
178
+ This model is designed to help understand communication patterns for constructive purposes such as:
179
+ - Improving dialogue quality
180
+ - Identifying harmful communication patterns
181
+ - Supporting mental health and relationship counseling
182
+ - Educational applications
183
+
184
+ **Important**: This model should not be used for:
185
+ - Surveillance without consent
186
+ - Discriminatory decision-making
187
+ - Automated content removal without human review
188
+ - Any application that could harm individuals or communities
189
+
190
+ ## Citation
191
+
192
+ If you use this model in your research, please cite:
193
+
194
+ ```bibtex
195
+ @misc{intentanalyzer2024,
196
+ author = {SamanthaStorm},
197
+ title = {IntentAnalyzer: Multi-Label Communication Intent Detection},
198
+ year = {2024},
199
+ publisher = {Hugging Face},
200
+ url = {https://huggingface.co/SamanthaStorm/intentanalyzer}
201
+ }
202
+ ```
203
+
204
+ ## License
205
+
206
+ This model is released under the MIT License.
207
+
208
+ ## Companion Models
209
+
210
+ This model works excellently in combination with:
211
+ - **FallacyFinder** ([SamanthaStorm/fallacyfinder](https://huggingface.co/SamanthaStorm/fallacyfinder)) - Logical fallacy detection
212
+ - Together they provide comprehensive communication analysis covering both logical reasoning and psychological intent
213
+
214
+ ---
215
+
216
+ **IntentAnalyzer** - Understanding the psychology behind human communication 🎭
config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MultiLabelIntentClassifier"
4
+ ],
5
+ "model_type": "distilbert",
6
+ "num_labels": 6,
7
+ "id2label": {
8
+ "0": "trolling",
9
+ "1": "dismissive",
10
+ "2": "manipulative",
11
+ "3": "emotionally_reactive",
12
+ "4": "constructive",
13
+ "5": "unclear"
14
+ },
15
+ "label2id": {
16
+ "trolling": 0,
17
+ "dismissive": 1,
18
+ "manipulative": 2,
19
+ "emotionally_reactive": 3,
20
+ "constructive": 4,
21
+ "unclear": 5
22
+ },
23
+ "base_model": "distilbert-base-uncased",
24
+ "task_specific_params": {
25
+ "text-classification": {
26
+ "problem_type": "multi_label_classification",
27
+ "num_labels": 6
28
+ }
29
+ },
30
+ "intent_categories": [
31
+ "trolling",
32
+ "dismissive",
33
+ "manipulative",
34
+ "emotionally_reactive",
35
+ "constructive",
36
+ "unclear"
37
+ ],
38
+ "model_architecture": "MultiLabelIntentClassifier",
39
+ "training_metrics": {
40
+ "f1_macro": 0.77,
41
+ "f1_trolling": 0.943,
42
+ "f1_dismissive": 0.85,
43
+ "f1_manipulative": 0.907,
44
+ "f1_emotionally_reactive": 0.939,
45
+ "f1_constructive": 0.989,
46
+ "f1_unclear": 0.0
47
+ }
48
+ }
modeling_intent.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import torch
3
+ import torch.nn as nn
4
+ from transformers import AutoModel
5
+
6
+ class MultiLabelIntentClassifier(nn.Module):
7
+ def __init__(self, model_name, num_labels):
8
+ super().__init__()
9
+ self.bert = AutoModel.from_pretrained(model_name)
10
+ self.dropout = nn.Dropout(0.3)
11
+ self.classifier = nn.Linear(self.bert.config.hidden_size, num_labels)
12
+
13
+ def forward(self, input_ids, attention_mask):
14
+ outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
15
+ pooled_output = outputs.last_hidden_state[:, 0] # Use [CLS] token
16
+ pooled_output = self.dropout(pooled_output)
17
+ logits = self.classifier(pooled_output)
18
+ return logits
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d06c315ccc3129b4860283640d34e976d89dea0b1d026118358e53854e43e32
3
+ size 265508834
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "DistilBertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
training_info.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "IntentAnalyzer",
3
+ "base_model": "distilbert-base-uncased",
4
+ "task": "multi_label_text_classification",
5
+ "training_examples": 858,
6
+ "validation_examples": 184,
7
+ "test_examples": 184,
8
+ "total_examples": 1226,
9
+ "intent_categories": [
10
+ "trolling",
11
+ "dismissive",
12
+ "manipulative",
13
+ "emotionally_reactive",
14
+ "constructive",
15
+ "unclear"
16
+ ],
17
+ "num_labels": 6,
18
+ "f1_macro": 0.77,
19
+ "training_epochs": 4,
20
+ "batch_size": 16,
21
+ "learning_rate": 2e-05,
22
+ "max_length": 128,
23
+ "architecture": "DistilBERT + Linear Classifier",
24
+ "loss_function": "BCEWithLogitsLoss"
25
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff