Logistic Regression Sentiment Analysis Model
This model is a Logistic Regression classifier trained on the TripAdvisor sentiment analysis dataset. It predicts the sentiment of hotel reviews on a 1-5 star scale. The model takes text input (hotel reviews) and outputs a sentiment rating from 1 to 5 stars.
Model Details
- Model Type: Logistic Regression
- Task: Sentiment Analysis
- Input: A hotel review (text)
- Output: Sentiment rating (1-5 stars)
- Trained Dataset: nhull/tripadvisor-split-dataset-v2
Intended Use
This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.
The model will return a sentiment rating between 1 and 5 stars, where:
- 1: Very bad
- 2: Bad
- 3: Neutral
- 4: Good
- 5: Very good
Dataset
The dataset used for training, validation, and testing is nhull/tripadvisor-split-dataset-v2. It consists of:
- Training Set: 30,400 reviews
- Validation Set: 1,600 reviews
- Test Set: 8,000 reviews
All splits are balanced across five sentiment labels.
Test Performance
Model predicts too high on average by 0.44.
| Label |
Precision |
Recall |
F1-score |
Support |
| 1.0 |
0.70 |
0.73 |
0.71 |
1600 |
| 2.0 |
0.52 |
0.50 |
0.51 |
1600 |
| 3.0 |
0.57 |
0.54 |
0.55 |
1600 |
| 4.0 |
0.55 |
0.54 |
0.55 |
1600 |
| 5.0 |
0.71 |
0.74 |
0.72 |
1600 |
| Accuracy |
- |
- |
0.61 |
8000 |
| Macro avg |
0.61 |
0.61 |
0.61 |
8000 |
| Weighted avg |
0.61 |
0.61 |
0.61 |
8000 |
| True \ Predicted |
1 |
2 |
3 |
4 |
5 |
| 1 |
1165 |
384 |
41 |
3 |
7 |
| 2 |
432 |
805 |
315 |
31 |
17 |
| 3 |
61 |
314 |
857 |
311 |
57 |
| 4 |
3 |
48 |
264 |
870 |
415 |
| 5 |
6 |
10 |
32 |
365 |
1187 |
Files Included
validation_results_log_regression.csv: Contains correctly classified reviews with their real and predicted labels.
Limitations
- The model performs well on extreme ratings (1 and 5 stars) but struggles with intermediate ratings (2, 3, and 4 stars).
- The model was trained on the TripAdvisor dataset and may not generalize well to reviews from other sources or domains.
- The model does not handle aspects like sarcasm or humor well, and shorter reviews may lead to less accurate predictions.