EnJiZ commited on
Commit
2fa8178
·
verified ·
1 Parent(s): 1672980

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - emotion-classification
5
+ - multilabel-classification
6
+ - text-classification
7
+ - pytorch
8
+ - transformers
9
+ datasets:
10
+ - emotion
11
+ metrics:
12
+ - f1
13
+ - accuracy
14
+ library_name: transformers
15
+ pipeline_tag: text-classification
16
+ ---
17
+
18
+ # Multilabel Emotion Classification Model - FirstTimeUp
19
+
20
+ This model is fine-tuned for multilabel emotion classification using distilbert-base-uncased as the base model.
21
+
22
+ ## Model Details
23
+ - **Model Name**: FirstTimeUp
24
+ - **Base Model**: distilbert-base-uncased
25
+ - **Task**: Multilabel Emotion Classification
26
+ - **Emotions**: amusement, anger, annoyance, caring, confusion, disappointment, disgust, embarrassment, excitement, fear, gratitude, joy, love, sadness
27
+ - **Total Parameters**: 66,373,646
28
+ - **Trainable Parameters**: 66,373,646
29
+
30
+ ## Quick Start
31
+
32
+ ### Installation
33
+ ```bash
34
+ pip install torch transformers huggingface_hub
35
+ ```
36
+
37
+ ### Usage
38
+
39
+ ```python
40
+ # Download the repository
41
+ from huggingface_hub import snapshot_download
42
+ import sys
43
+ import os
44
+
45
+ # Download model files
46
+ model_path = snapshot_download(repo_id="EnJiZ/FirstTimeUp")
47
+
48
+ # Add to path and import
49
+ sys.path.append(model_path)
50
+ from model import predict_emotions
51
+
52
+ # Predict emotions
53
+ text = "I am so happy and excited!"
54
+ emotions = predict_emotions(text, model_path)
55
+ print(emotions)
56
+ ```
57
+
58
+ ### Advanced Usage
59
+
60
+ ```python
61
+ import torch
62
+ from transformers import AutoTokenizer
63
+ import sys
64
+ sys.path.append(model_path)
65
+ from model import MultiLabelEmotionClassifier, load_model
66
+
67
+ # Load model manually
68
+ model, config = load_model(model_path)
69
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
70
+
71
+ # Custom prediction with different threshold
72
+ def custom_predict(text, threshold=0.3):
73
+ encoding = tokenizer(
74
+ text,
75
+ truncation=True,
76
+ padding='max_length',
77
+ max_length=128,
78
+ return_tensors='pt'
79
+ )
80
+
81
+ model.eval()
82
+ with torch.no_grad():
83
+ logits = model(encoding['input_ids'], encoding['attention_mask'])
84
+ probabilities = torch.sigmoid(logits)
85
+ predictions = (probabilities > threshold).int()
86
+
87
+ emotion_labels = ['amusement', 'anger', 'annoyance', 'caring', 'confusion', 'disappointment', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'joy', 'love', 'sadness']
88
+ result = {emotion: {
89
+ 'predicted': bool(pred),
90
+ 'probability': float(prob)
91
+ } for emotion, pred, prob in zip(emotion_labels, predictions[0], probabilities[0])}
92
+ return result
93
+
94
+ # Example with probabilities
95
+ result = custom_predict("I feel great today!", threshold=0.3)
96
+ print(result)
97
+ ```
98
+
99
+ ## Model Architecture
100
+ - **Base**: distilbert-base-uncased
101
+ - **Classification Head**: Linear layer with dropout (dropout_rate=0.3)
102
+ - **Loss Function**: BCEWithLogitsLoss
103
+ - **Activation**: Sigmoid (for multilabel classification)
104
+
105
+ ## Training Details
106
+ - **Epochs**: 3
107
+ - **Batch Size**: 32
108
+ - **Learning Rate**: 2e-05
109
+ - **Max Sequence Length**: 128
110
+ - **Optimizer**: AdamW with weight decay (0.01)
111
+ - **Scheduler**: Linear warmup + decay
112
+
113
+ ## Files in this Repository
114
+ - `config.json`: Model configuration
115
+ - `pytorch_model.bin`: Model weights
116
+ - `tokenizer.json`, `tokenizer_config.json`: Tokenizer files
117
+ - `model.py`: Custom model class and utility functions
118
+ - `README.md`: This file
119
+
120
+ ## Performance
121
+ - **Task**: Multilabel Emotion Classification
122
+ - **Metrics**: F1-Score (Micro & Macro), Accuracy
123
+ - **Validation Strategy**: 80/20 train-validation split
124
+
125
+ ## Supported Emotions
126
+ amusement, anger, annoyance, caring, confusion, disappointment, disgust, embarrassment, excitement, fear, gratitude, joy, love, sadness
127
+
128
+ ## License
129
+ This model is released under the Apache 2.0 license.
130
+
131
+ ## Citation
132
+ ```bibtex
133
+ @misc{firsttimeup2024,
134
+ title={FirstTimeUp: Multilabel Emotion Classification Model},
135
+ author={EnJiZ},
136
+ year={2024},
137
+ url={https://huggingface.co/EnJiZ/FirstTimeUp}
138
+ }
139
+ ```
140
+
141
+ ## Contact
142
+ For questions or issues, please open an issue in the repository.
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MultiLabelEmotionClassifier"
4
+ ],
5
+ "model_type": "custom_multilabel_emotion",
6
+ "base_model": "distilbert-base-uncased",
7
+ "num_labels": 14,
8
+ "emotion_labels": [
9
+ "amusement",
10
+ "anger",
11
+ "annoyance",
12
+ "caring",
13
+ "confusion",
14
+ "disappointment",
15
+ "disgust",
16
+ "embarrassment",
17
+ "excitement",
18
+ "fear",
19
+ "gratitude",
20
+ "joy",
21
+ "love",
22
+ "sadness"
23
+ ],
24
+ "max_position_embeddings": 128,
25
+ "dropout_rate": 0.3,
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.21.0"
28
+ }
model.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import torch
3
+ import torch.nn as nn
4
+ from transformers import AutoTokenizer, AutoModel, AutoConfig
5
+
6
+ class MultiLabelEmotionClassifier(nn.Module):
7
+ def __init__(self, model_name, num_labels, dropout_rate=0.3):
8
+ super().__init__()
9
+
10
+ # Load pre-trained model
11
+ self.config = AutoConfig.from_pretrained(model_name)
12
+ self.transformer = AutoModel.from_pretrained(model_name)
13
+
14
+ # Classifier head
15
+ self.dropout = nn.Dropout(dropout_rate)
16
+ self.classifier = nn.Linear(self.config.hidden_size, num_labels)
17
+
18
+ # Initialize weights
19
+ self._init_weights()
20
+
21
+ def _init_weights(self):
22
+ """Initialize classifier weights"""
23
+ nn.init.normal_(self.classifier.weight, std=0.02)
24
+ nn.init.zeros_(self.classifier.bias)
25
+
26
+ def forward(self, input_ids, attention_mask):
27
+ # Get transformer outputs
28
+ outputs = self.transformer(
29
+ input_ids=input_ids,
30
+ attention_mask=attention_mask
31
+ )
32
+
33
+ # Use [CLS] token representation
34
+ pooled_output = outputs.last_hidden_state[:, 0] # [CLS] token
35
+
36
+ # Apply dropout and classifier
37
+ pooled_output = self.dropout(pooled_output)
38
+ logits = self.classifier(pooled_output)
39
+
40
+ return logits
41
+
42
+ def load_model(model_path="."):
43
+ """Load the custom model"""
44
+ import json
45
+ import os
46
+
47
+ # Load config
48
+ with open(os.path.join(model_path, "config.json"), "r") as f:
49
+ config = json.load(f)
50
+
51
+ # Initialize model
52
+ model = MultiLabelEmotionClassifier(
53
+ model_name=config["base_model"],
54
+ num_labels=config["num_labels"],
55
+ dropout_rate=config["dropout_rate"]
56
+ )
57
+
58
+ # Load weights
59
+ checkpoint = torch.load(os.path.join(model_path, "pytorch_model.bin"), map_location="cpu")
60
+ model.load_state_dict(checkpoint["model_state_dict"])
61
+
62
+ return model, config
63
+
64
+ def predict_emotions(text, model_path=".", threshold=0.5):
65
+ """Predict emotions for given text"""
66
+ # Load model and tokenizer
67
+ model, config = load_model(model_path)
68
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
69
+
70
+ # Tokenize input
71
+ encoding = tokenizer(
72
+ text,
73
+ truncation=True,
74
+ padding='max_length',
75
+ max_length=config["max_position_embeddings"],
76
+ return_tensors='pt'
77
+ )
78
+
79
+ # Predict
80
+ model.eval()
81
+ with torch.no_grad():
82
+ logits = model(encoding['input_ids'], encoding['attention_mask'])
83
+ probabilities = torch.sigmoid(logits)
84
+ predictions = (probabilities > threshold).int()
85
+
86
+ # Format results
87
+ emotion_labels = config["emotion_labels"]
88
+ result = {emotion: bool(pred) for emotion, pred in zip(emotion_labels, predictions[0])}
89
+ return result
90
+
91
+ # Example usage:
92
+ # emotions = predict_emotions("I am so happy and excited!")
93
+ # print(emotions)
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0541799e9cfe7ba672cff94a74980d64db9081560c1f9842311038ad753fc5ba
3
+ size 265462608
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d68b6f0217225e55b1a08e8d458499abd02a16a2ffeb12ba90269e41c78c126
3
+ size 265536674
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "DistilBertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff