Thi144 commited on
Commit
6647635
·
verified ·
1 Parent(s): f1c5f54

Add model documentation

Browse files
Files changed (1) hide show
  1. README.md +160 -0
README.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - sentiment-analysis
6
+ - text-classification
7
+ - distilbert
8
+ - pytorch
9
+ - transformers
10
+ datasets:
11
+ - imdb
12
+ metrics:
13
+ - accuracy
14
+ - f1
15
+ widget:
16
+ - text: "This movie was absolutely amazing! Best film I've seen all year!"
17
+ example_title: "Very Positive"
18
+ - text: "Pretty good movie, enjoyed it overall."
19
+ example_title: "Slightly Positive"
20
+ - text: "It was okay, nothing special but not bad either."
21
+ example_title: "Neutral"
22
+ - text: "Not a great movie, pretty disappointing."
23
+ example_title: "Slightly Negative"
24
+ - text: "Terrible film, complete waste of time and money!"
25
+ example_title: "Very Negative"
26
+ ---
27
+
28
+ # DistilBERT 7-Class Sentiment Analysis Model
29
+
30
+ A fine-tuned DistilBERT model for nuanced sentiment analysis with 7 sentiment classes on a scale from -3 (Very Negative) to +3 (Very Positive).
31
+
32
+ ## Model Description
33
+
34
+ This model performs fine-grained sentiment classification, providing more nuanced predictions than traditional binary positive/negative models. It's particularly useful for business applications where understanding the intensity of sentiment matters (e.g., identifying "at-risk" customers vs. extremely dissatisfied ones).
35
+
36
+ **Architecture:** DistilBERT (distilbert-base-uncased)
37
+ **Parameters:** 66 million
38
+ **Training Data:** 6,000 IMDB movie reviews
39
+ **Accuracy:** 73.7%
40
+
41
+ ## Sentiment Classes
42
+
43
+ | Class | Scale | Label | Description |
44
+ |-------|-------|-------|-------------|
45
+ | 0 | -3 | Very Negative | Extremely dissatisfied, angry |
46
+ | 1 | -2 | Negative | Clearly unhappy, disappointed |
47
+ | 2 | -1 | Slightly Negative | Somewhat disappointed |
48
+ | 3 | 0 | Neutral | Balanced, neither positive nor negative |
49
+ | 4 | +1 | Slightly Positive | Somewhat satisfied |
50
+ | 5 | +2 | Positive | Clearly satisfied, happy |
51
+ | 6 | +3 | Very Positive | Extremely satisfied, delighted |
52
+
53
+ ## Usage
54
+
55
+ ```python
56
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
57
+ import torch
58
+
59
+ # Load model and tokenizer
60
+ model_id = "Thi144/sentiment-distilbert-7class"
61
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
62
+ model = AutoModelForSequenceClassification.from_pretrained(model_id)
63
+
64
+ # Class mapping
65
+ CLASS_LABELS = {
66
+ 0: {"scale": -3, "label": "negative", "name": "Very Negative"},
67
+ 1: {"scale": -2, "label": "negative", "name": "Negative"},
68
+ 2: {"scale": -1, "label": "negative", "name": "Slightly Negative"},
69
+ 3: {"scale": 0, "label": "neutral", "name": "Neutral"},
70
+ 4: {"scale": 1, "label": "positive", "name": "Slightly Positive"},
71
+ 5: {"scale": 2, "label": "positive", "name": "Positive"},
72
+ 6: {"scale": 3, "label": "positive", "name": "Very Positive"}
73
+ }
74
+
75
+ # Predict sentiment
76
+ def predict_sentiment(text):
77
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
78
+
79
+ with torch.no_grad():
80
+ outputs = model(**inputs)
81
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
82
+ class_id = predictions.argmax().item()
83
+ confidence = predictions[0][class_id].item()
84
+
85
+ result = CLASS_LABELS[class_id]
86
+ return {
87
+ "class": class_id,
88
+ "scale": result["scale"],
89
+ "label": result["label"],
90
+ "name": result["name"],
91
+ "confidence": confidence
92
+ }
93
+
94
+ # Example
95
+ result = predict_sentiment("This movie was absolutely amazing!")
96
+ print(f"Sentiment: {result['name']} (Scale: {result['scale']}, Confidence: {result['confidence']:.2%})")
97
+ ```
98
+
99
+ ## Performance Metrics
100
+
101
+ **Overall Accuracy:** 73.7%
102
+
103
+ **Class-Specific Performance:**
104
+ - **Very Negative (-3):** 81% precision, 88% recall
105
+ - **Negative (-2):** 83% precision, 77% recall
106
+ - **Slightly Negative (-1):** 54% precision, 58% recall
107
+ - **Neutral (0):** 86% precision, 64% recall
108
+ - **Slightly Positive (+1):** 58% precision, 54% recall
109
+ - **Positive (+2):** 79% precision, 83% recall
110
+ - **Very Positive (+3):** 88% precision, 81% recall
111
+
112
+ The model performs best at identifying strong sentiments (Very Negative/Positive) and struggles most with subtle distinctions (Slightly Negative/Positive).
113
+
114
+ ## Training Details
115
+
116
+ - **Base Model:** distilbert-base-uncased
117
+ - **Dataset:** 6,000 IMDB reviews (4,800 train, 1,200 test)
118
+ - **Label Conversion:** Binary labels converted to 7-class using text intensity analysis
119
+ - **Epochs:** 4
120
+ - **Batch Size:** 16
121
+ - **Optimizer:** AdamW (lr=2e-5)
122
+ - **Training Time:** ~15-20 minutes on CPU
123
+
124
+ ## Limitations
125
+
126
+ - Trained on movie reviews, may not generalize perfectly to other domains
127
+ - Slightly Negative/Positive classes have lower accuracy (~54-58%)
128
+ - Performance depends on text clarity and length
129
+ - May struggle with sarcasm or complex sentiment
130
+
131
+ ## Intended Use
132
+
133
+ **Primary Use Cases:**
134
+ - Customer feedback analysis with nuanced sentiment scoring
135
+ - Product review sentiment classification
136
+ - Social media monitoring with intensity detection
137
+ - Business intelligence dashboards requiring granular sentiment
138
+
139
+ **Not Recommended For:**
140
+ - Safety-critical applications
141
+ - Legal decision-making
142
+ - Medical diagnosis
143
+
144
+ ## License
145
+
146
+ Apache 2.0
147
+
148
+ ## Citation
149
+
150
+ If you use this model, please cite:
151
+
152
+ ```
153
+ @model{thi144-sentiment-distilbert-7class,
154
+ author = {Thi144},
155
+ title = {DistilBERT 7-Class Sentiment Analysis},
156
+ year = {2025},
157
+ publisher = {HuggingFace},
158
+ url = {https://huggingface.co/Thi144/sentiment-distilbert-7class}
159
+ }
160
+ ```