nhull
/

distilbert-sentiment-model

+# DistilBERT Sentiment Analysis Model
+## Overview
+This repository contains a fine-tuned **DistilBERT** model trained for sentiment analysis on TripAdvisor reviews. The model predicts sentiment scores on a scale of 1 to 5 based on review text.
+- **Base Model**: `distilbert-base-uncased`
+- **Trained Dataset**: [nhull/tripadvisor-split-dataset-v2](https://huggingface.co/datasets/nhull/tripadvisor-split-dataset-v2)
+- **Use Case**: Sentiment classification for customer reviews to derive insights into customer satisfaction.
+- **Output**: Sentiment labels (1-5).
+---
+## Model Details
+- **Learning Rate**: `3e-05`
+- **Batch Size**: `64`
+- **Epochs**: `10` (with early stopping)
+- **Patience**: `5` (epochs without improvement)
+- **Tokenizer**: `distilbert-base-uncased`
+- **Framework**: PyTorch + Hugging Face Transformers
+---
+## Training and Validation
+### Dataset
+The dataset used for training, validation, and testing is [nhull/tripadvisor-split-dataset-v2](https://huggingface.co/datasets/nhull/tripadvisor-split-dataset-v2). It consists of:
+- **Training Set**: 30,400 reviews
+- **Validation Set**: 1,600 reviews
+- **Test Set**: 8,000 reviews
+All splits are balanced across five sentiment labels.
+### Validation Performance
+| Metric     | Value  |
+|------------|--------|
+| Accuracy   | 0.6294 |
+| Precision  | 0.6313 |
+| Recall     | 0.6294 |
+| F1-Score   | 0.6297 |
+#### Classification Report (Validation Set)
+| Label | Precision | Recall | F1-Score | Support |
+|-------|-----------|--------|----------|---------|
+| 1     | 0.7612    | 0.6875 | 0.7225   | 320     |
+| 2     | 0.5255    | 0.5469 | 0.5360   | 320     |
+| 3     | 0.5859    | 0.5969 | 0.5913   | 320     |
+| 4     | 0.5696    | 0.5500 | 0.5596   | 320     |
+| 5     | 0.7143    | 0.7656 | 0.7391   | 320     |
+### Confusion Matrix (Validation Set)
+| Predicted\Actual | 1   | 2   | 3   | 4   | 5   |
+|-------------------|-----|-----|-----|-----|-----|
+| **1**            | 220 | 87  | 10  | 1   | 2   |
+| **2**            | 63  | 175 | 75  | 6   | 1   |
+| **3**            | 4   | 61  | 191 | 60  | 4   |
+| **4**            | 1   | 6   | 46  | 176 | 91  |
+| **5**            | 1   | 4   | 4   | 66  | 245 |
+### Test Performance
+| Metric     | Value  |
+|------------|--------|
+| Accuracy   | 0.6391 |
+| Precision  | 0.6416 |
+| Recall     | 0.6391 |
+| F1-Score   | 0.6400 |
+#### Classification Report (Test Set)
+| Label | Precision | Recall | F1-Score | Support |
+|-------|-----------|--------|----------|---------|
+| 1     | 0.7483    | 0.6856 | 0.7156   | 1600    |
+| 2     | 0.5445    | 0.5544 | 0.5494   | 1600    |
+| 3     | 0.6000    | 0.6281 | 0.6137   | 1600    |
+| 4     | 0.5828    | 0.5894 | 0.5861   | 1600    |
+| 5     | 0.7326    | 0.7381 | 0.7354   | 1600    |
+### Confusion Matrix (Test Set)
+| Predicted\Actual | 1    | 2    | 3    | 4    | 5    |
+|-------------------|------|------|------|------|------|
+| **1**            | 1097 | 437  | 60   | 3    | 3    |
+| **2**            | 327  | 887  | 344  | 34   | 8    |
+| **3**            | 37   | 278  | 1005 | 254  | 26   |
+| **4**            | 3    | 21   | 239  | 943  | 394  |
+| **5**            | 2    | 6    | 27   | 384  | 1181 |
+---
+## How to Use
+### Load the Model
+```python
+from transformers import pipeline
+model_name = "models/distilbert/best_trained_model"
+classifier = pipeline("text-classification", model=model_name, tokenizer=model_name)
+text = "The hotel was great, but the staff was rude."
+result = classifier(text)
+print(result)  # [{'label': '3', 'score': 0.82}]
+```
+### Evaluate Custom Text
+To evaluate custom text or datasets, load the tokenizer and model as follows:
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Load model and tokenizer
+model = AutoModelForSequenceClassification.from_pretrained("models/distilbert/best_trained_model")
+tokenizer = AutoTokenizer.from_pretrained("models/distilbert/best_trained_model")
+# Input text
+text = "The room was clean and spacious, but the food was disappointing."
+inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
+# Model prediction
+outputs = model(**inputs)
+predicted_label = torch.argmax(outputs.logits) + 1  # Convert back to 1-based indexing
+print(f"Predicted Sentiment: {predicted_label}")
+```
+---
+## Files Included
+- **`correct_predictions.csv`**: Contains correctly classified reviews with their real and predicted labels.
+- **`misclassified_predictions.csv`**: Contains misclassified reviews with their real and predicted labels, along with the difference.
+## Limitations
+1. Domain-Specific: The model was trained on TripAdvisor reviews, so it may not generalize to other types of reviews or domains without further fine-tuning.
+2. Subjectivity: Sentiment annotations are subjective and may not fully represent every user's perception.
+3. Performance: Mid-range sentiment labels (2 and 3) have lower precision and recall compared to extreme sentiment labels (1 and 5).