nhull commited on
Commit
b2e7dd7
·
verified ·
1 Parent(s): cdedae6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +143 -0
README.md CHANGED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DistilBERT Sentiment Analysis Model
2
+
3
+ ## Overview
4
+
5
+ This repository contains a fine-tuned **DistilBERT** model trained for sentiment analysis on TripAdvisor reviews. The model predicts sentiment scores on a scale of 1 to 5 based on review text.
6
+
7
+ - **Base Model**: `distilbert-base-uncased`
8
+ - **Trained Dataset**: [nhull/tripadvisor-split-dataset-v2](https://huggingface.co/datasets/nhull/tripadvisor-split-dataset-v2)
9
+ - **Use Case**: Sentiment classification for customer reviews to derive insights into customer satisfaction.
10
+ - **Output**: Sentiment labels (1-5).
11
+
12
+ ---
13
+
14
+ ## Model Details
15
+
16
+ - **Learning Rate**: `3e-05`
17
+ - **Batch Size**: `64`
18
+ - **Epochs**: `10` (with early stopping)
19
+ - **Patience**: `5` (epochs without improvement)
20
+ - **Tokenizer**: `distilbert-base-uncased`
21
+ - **Framework**: PyTorch + Hugging Face Transformers
22
+
23
+ ---
24
+
25
+ ## Training and Validation
26
+
27
+ ### Dataset
28
+
29
+ The dataset used for training, validation, and testing is [nhull/tripadvisor-split-dataset-v2](https://huggingface.co/datasets/nhull/tripadvisor-split-dataset-v2). It consists of:
30
+
31
+ - **Training Set**: 30,400 reviews
32
+ - **Validation Set**: 1,600 reviews
33
+ - **Test Set**: 8,000 reviews
34
+
35
+ All splits are balanced across five sentiment labels.
36
+
37
+ ### Validation Performance
38
+
39
+ | Metric | Value |
40
+ |------------|--------|
41
+ | Accuracy | 0.6294 |
42
+ | Precision | 0.6313 |
43
+ | Recall | 0.6294 |
44
+ | F1-Score | 0.6297 |
45
+
46
+ #### Classification Report (Validation Set)
47
+
48
+ | Label | Precision | Recall | F1-Score | Support |
49
+ |-------|-----------|--------|----------|---------|
50
+ | 1 | 0.7612 | 0.6875 | 0.7225 | 320 |
51
+ | 2 | 0.5255 | 0.5469 | 0.5360 | 320 |
52
+ | 3 | 0.5859 | 0.5969 | 0.5913 | 320 |
53
+ | 4 | 0.5696 | 0.5500 | 0.5596 | 320 |
54
+ | 5 | 0.7143 | 0.7656 | 0.7391 | 320 |
55
+
56
+ ### Confusion Matrix (Validation Set)
57
+
58
+ | Predicted\Actual | 1 | 2 | 3 | 4 | 5 |
59
+ |-------------------|-----|-----|-----|-----|-----|
60
+ | **1** | 220 | 87 | 10 | 1 | 2 |
61
+ | **2** | 63 | 175 | 75 | 6 | 1 |
62
+ | **3** | 4 | 61 | 191 | 60 | 4 |
63
+ | **4** | 1 | 6 | 46 | 176 | 91 |
64
+ | **5** | 1 | 4 | 4 | 66 | 245 |
65
+
66
+ ### Test Performance
67
+
68
+ | Metric | Value |
69
+ |------------|--------|
70
+ | Accuracy | 0.6391 |
71
+ | Precision | 0.6416 |
72
+ | Recall | 0.6391 |
73
+ | F1-Score | 0.6400 |
74
+
75
+ #### Classification Report (Test Set)
76
+
77
+ | Label | Precision | Recall | F1-Score | Support |
78
+ |-------|-----------|--------|----------|---------|
79
+ | 1 | 0.7483 | 0.6856 | 0.7156 | 1600 |
80
+ | 2 | 0.5445 | 0.5544 | 0.5494 | 1600 |
81
+ | 3 | 0.6000 | 0.6281 | 0.6137 | 1600 |
82
+ | 4 | 0.5828 | 0.5894 | 0.5861 | 1600 |
83
+ | 5 | 0.7326 | 0.7381 | 0.7354 | 1600 |
84
+
85
+ ### Confusion Matrix (Test Set)
86
+
87
+ | Predicted\Actual | 1 | 2 | 3 | 4 | 5 |
88
+ |-------------------|------|------|------|------|------|
89
+ | **1** | 1097 | 437 | 60 | 3 | 3 |
90
+ | **2** | 327 | 887 | 344 | 34 | 8 |
91
+ | **3** | 37 | 278 | 1005 | 254 | 26 |
92
+ | **4** | 3 | 21 | 239 | 943 | 394 |
93
+ | **5** | 2 | 6 | 27 | 384 | 1181 |
94
+
95
+ ---
96
+
97
+ ## How to Use
98
+
99
+ ### Load the Model
100
+
101
+ ```python
102
+ from transformers import pipeline
103
+
104
+ model_name = "models/distilbert/best_trained_model"
105
+ classifier = pipeline("text-classification", model=model_name, tokenizer=model_name)
106
+
107
+ text = "The hotel was great, but the staff was rude."
108
+ result = classifier(text)
109
+ print(result) # [{'label': '3', 'score': 0.82}]
110
+ ```
111
+
112
+ ### Evaluate Custom Text
113
+ To evaluate custom text or datasets, load the tokenizer and model as follows:
114
+ ```python
115
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
116
+ import torch
117
+
118
+ # Load model and tokenizer
119
+ model = AutoModelForSequenceClassification.from_pretrained("models/distilbert/best_trained_model")
120
+ tokenizer = AutoTokenizer.from_pretrained("models/distilbert/best_trained_model")
121
+
122
+ # Input text
123
+ text = "The room was clean and spacious, but the food was disappointing."
124
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
125
+
126
+ # Model prediction
127
+ outputs = model(**inputs)
128
+ predicted_label = torch.argmax(outputs.logits) + 1 # Convert back to 1-based indexing
129
+ print(f"Predicted Sentiment: {predicted_label}")
130
+ ```
131
+
132
+ ---
133
+
134
+ ## Files Included
135
+
136
+ - **`correct_predictions.csv`**: Contains correctly classified reviews with their real and predicted labels.
137
+ - **`misclassified_predictions.csv`**: Contains misclassified reviews with their real and predicted labels, along with the difference.
138
+
139
+ ## Limitations
140
+
141
+ 1. Domain-Specific: The model was trained on TripAdvisor reviews, so it may not generalize to other types of reviews or domains without further fine-tuning.
142
+ 2. Subjectivity: Sentiment annotations are subjective and may not fully represent every user's perception.
143
+ 3. Performance: Mid-range sentiment labels (2 and 3) have lower precision and recall compared to extreme sentiment labels (1 and 5).