Update README.md

# DistilBERT Sentiment Analysis Model

## Overview

This repository contains a fine-tuned **DistilBERT** model trained for sentiment analysis on TripAdvisor reviews. The model predicts sentiment scores on a scale of 1 to 5 based on review text.

- **Base Model**: `distilbert-base-uncased`
- **Trained Dataset**: [nhull/tripadvisor-split-dataset-v2](https://huggingface.co/datasets/nhull/tripadvisor-split-dataset-v2)
- **Use Case**: Sentiment classification for customer reviews to derive insights into customer satisfaction.
- **Output**: Sentiment labels (1-5).

---

## Model Details

- **Learning Rate**: `3e-05`
- **Batch Size**: `64`
- **Epochs**: `10` (with early stopping)
- **Patience**: `5` (epochs without improvement)
- **Tokenizer**: `distilbert-base-uncased`
- **Framework**: PyTorch + Hugging Face Transformers

---

## Training and Validation

### Dataset

The dataset used for training, validation, and testing is [nhull/tripadvisor-split-dataset-v2](https://huggingface.co/datasets/nhull/tripadvisor-split-dataset-v2). It consists of:

- **Training Set**: 30,400 reviews
- **Validation Set**: 1,600 reviews
- **Test Set**: 8,000 reviews

All splits are balanced across five sentiment labels.

### Validation Performance

| Metric | Value |
|------------|--------|
| Accuracy | 0.6294 |
| Precision | 0.6313 |
| Recall | 0.6294 |
| F1-Score | 0.6297 |

#### Classification Report (Validation Set)

| Label | Precision | Recall | F1-Score | Support |
|-------|-----------|--------|----------|---------|
| 1 | 0.7612 | 0.6875 | 0.7225 | 320 |
| 2 | 0.5255 | 0.5469 | 0.5360 | 320 |
| 3 | 0.5859 | 0.5969 | 0.5913 | 320 |
| 4 | 0.5696 | 0.5500 | 0.5596 | 320 |
| 5 | 0.7143 | 0.7656 | 0.7391 | 320 |

### Confusion Matrix (Validation Set)

| Predicted\Actual | 1 | 2 | 3 | 4 | 5 |
|-------------------|-----|-----|-----|-----|-----|
| **1** | 220 | 87 | 10 | 1 | 2 |
| **2** | 63 | 175 | 75 | 6 | 1 |
| **3** | 4 | 61 | 191 | 60 | 4 |
| **4** | 1 | 6 | 46 | 176 | 91 |
| **5** | 1 | 4 | 4 | 66 | 245 |

### Test Performance

| Metric | Value |
|------------|--------|
| Accuracy | 0.6391 |
| Precision | 0.6416 |
| Recall | 0.6391 |
| F1-Score | 0.6400 |

#### Classification Report (Test Set)

| Label | Precision | Recall | F1-Score | Support |
|-------|-----------|--------|----------|---------|
| 1 | 0.7483 | 0.6856 | 0.7156 | 1600 |
| 2 | 0.5445 | 0.5544 | 0.5494 | 1600 |
| 3 | 0.6000 | 0.6281 | 0.6137 | 1600 |
| 4 | 0.5828 | 0.5894 | 0.5861 | 1600 |
| 5 | 0.7326 | 0.7381 | 0.7354 | 1600 |

### Confusion Matrix (Test Set)

| Predicted\Actual | 1 | 2 | 3 | 4 | 5 |
|-------------------|------|------|------|------|------|
| **1** | 1097 | 437 | 60 | 3 | 3 |
| **2** | 327 | 887 | 344 | 34 | 8 |
| **3** | 37 | 278 | 1005 | 254 | 26 |
| **4** | 3 | 21 | 239 | 943 | 394 |
| **5** | 2 | 6 | 27 | 384 | 1181 |

---

## How to Use

### Load the Model

```python
from transformers import pipeline

model_name = "models/distilbert/best_trained_model"
classifier = pipeline("text-classification", model=model_name, tokenizer=model_name)

text = "The hotel was great, but the staff was rude."
result = classifier(text)
print(result) # [{'label': '3', 'score': 0.82}]
```

### Evaluate Custom Text
To evaluate custom text or datasets, load the tokenizer and model as follows:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("models/distilbert/best_trained_model")
tokenizer = AutoTokenizer.from_pretrained("models/distilbert/best_trained_model")

# Input text
text = "The room was clean and spacious, but the food was disappointing."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

# Model prediction
outputs = model(**inputs)
predicted_label = torch.argmax(outputs.logits) + 1 # Convert back to 1-based indexing
print(f"Predicted Sentiment: {predicted_label}")
```

---

## Files Included

- **`correct_predictions.csv`**: Contains correctly classified reviews with their real and predicted labels.
- **`misclassified_predictions.csv`**: Contains misclassified reviews with their real and predicted labels, along with the difference.

## Limitations

1. Domain-Specific: The model was trained on TripAdvisor reviews, so it may not generalize to other types of reviews or domains without further fine-tuning.
2. Subjectivity: Sentiment annotations are subjective and may not fully represent every user's perception.
3. Performance: Mid-range sentiment labels (2 and 3) have lower precision and recall compared to extreme sentiment labels (1 and 5).

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

	@@ -0,0 +1,6 @@

+---
+language:
+- en
+base_model:
+- distilbert/distilbert-base-uncased
+---