Update README.md
Browse files
README.md
CHANGED
|
@@ -35,9 +35,11 @@ This repository contains a fine-tuned **DistilBERT** model trained for sentiment
|
|
| 35 |
- **Tokenizer**: `distilbert-base-uncased`
|
| 36 |
- **Framework**: PyTorch + Hugging Face Transformers
|
| 37 |
|
| 38 |
-
|
|
|
|
|
|
|
| 39 |
|
| 40 |
-
|
| 41 |
|
| 42 |
### Dataset
|
| 43 |
|
|
@@ -49,34 +51,7 @@ The dataset used for training, validation, and testing is [nhull/tripadvisor-spl
|
|
| 49 |
|
| 50 |
All splits are balanced across five sentiment labels.
|
| 51 |
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
| Metric | Value |
|
| 55 |
-
|------------|--------|
|
| 56 |
-
| Accuracy | 0.6294 |
|
| 57 |
-
| Precision | 0.6313 |
|
| 58 |
-
| Recall | 0.6294 |
|
| 59 |
-
| F1-Score | 0.6297 |
|
| 60 |
-
|
| 61 |
-
#### Classification Report (Validation Set)
|
| 62 |
-
|
| 63 |
-
| Label | Precision | Recall | F1-Score | Support |
|
| 64 |
-
|-------|-----------|--------|----------|---------|
|
| 65 |
-
| 1 | 0.7612 | 0.6875 | 0.7225 | 320 |
|
| 66 |
-
| 2 | 0.5255 | 0.5469 | 0.5360 | 320 |
|
| 67 |
-
| 3 | 0.5859 | 0.5969 | 0.5913 | 320 |
|
| 68 |
-
| 4 | 0.5696 | 0.5500 | 0.5596 | 320 |
|
| 69 |
-
| 5 | 0.7143 | 0.7656 | 0.7391 | 320 |
|
| 70 |
-
|
| 71 |
-
### Confusion Matrix (Validation Set)
|
| 72 |
-
|
| 73 |
-
| Predicted\Actual | 1 | 2 | 3 | 4 | 5 |
|
| 74 |
-
|-------------------|-----|-----|-----|-----|-----|
|
| 75 |
-
| **1** | 220 | 87 | 10 | 1 | 2 |
|
| 76 |
-
| **2** | 63 | 175 | 75 | 6 | 1 |
|
| 77 |
-
| **3** | 4 | 61 | 191 | 60 | 4 |
|
| 78 |
-
| **4** | 1 | 6 | 46 | 176 | 91 |
|
| 79 |
-
| **5** | 1 | 4 | 4 | 66 | 245 |
|
| 80 |
|
| 81 |
### Test Performance
|
| 82 |
|
|
@@ -111,43 +86,6 @@ Model predicts too high on average by `0.3934`.
|
|
| 111 |
|
| 112 |
---
|
| 113 |
|
| 114 |
-
## How to Use
|
| 115 |
-
|
| 116 |
-
### Load the Model
|
| 117 |
-
|
| 118 |
-
```python
|
| 119 |
-
from transformers import pipeline
|
| 120 |
-
|
| 121 |
-
model_name = "models/distilbert/best_trained_model"
|
| 122 |
-
classifier = pipeline("text-classification", model=model_name, tokenizer=model_name)
|
| 123 |
-
|
| 124 |
-
text = "The hotel was great, but the staff was rude."
|
| 125 |
-
result = classifier(text)
|
| 126 |
-
print(result) # [{'label': '3', 'score': 0.82}]
|
| 127 |
-
```
|
| 128 |
-
|
| 129 |
-
### Evaluate Custom Text
|
| 130 |
-
To evaluate custom text or datasets, load the tokenizer and model as follows:
|
| 131 |
-
```python
|
| 132 |
-
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 133 |
-
import torch
|
| 134 |
-
|
| 135 |
-
# Load model and tokenizer
|
| 136 |
-
model = AutoModelForSequenceClassification.from_pretrained("models/distilbert/best_trained_model")
|
| 137 |
-
tokenizer = AutoTokenizer.from_pretrained("models/distilbert/best_trained_model")
|
| 138 |
-
|
| 139 |
-
# Input text
|
| 140 |
-
text = "The room was clean and spacious, but the food was disappointing."
|
| 141 |
-
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
|
| 142 |
-
|
| 143 |
-
# Model prediction
|
| 144 |
-
outputs = model(**inputs)
|
| 145 |
-
predicted_label = torch.argmax(outputs.logits) + 1 # Convert back to 1-based indexing
|
| 146 |
-
print(f"Predicted Sentiment: {predicted_label}")
|
| 147 |
-
```
|
| 148 |
-
|
| 149 |
-
---
|
| 150 |
-
|
| 151 |
## Files Included
|
| 152 |
|
| 153 |
- **`validation_results_distilbert.csv`**: Contains correctly classified reviews with their real and predicted labels.
|
|
|
|
| 35 |
- **Tokenizer**: `distilbert-base-uncased`
|
| 36 |
- **Framework**: PyTorch + Hugging Face Transformers
|
| 37 |
|
| 38 |
+
## Intended Use
|
| 39 |
+
|
| 40 |
+
This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.
|
| 41 |
|
| 42 |
+
---
|
| 43 |
|
| 44 |
### Dataset
|
| 45 |
|
|
|
|
| 51 |
|
| 52 |
All splits are balanced across five sentiment labels.
|
| 53 |
|
| 54 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
### Test Performance
|
| 57 |
|
|
|
|
| 86 |
|
| 87 |
---
|
| 88 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
## Files Included
|
| 90 |
|
| 91 |
- **`validation_results_distilbert.csv`**: Contains correctly classified reviews with their real and predicted labels.
|