Commit
·
f2f8351
1
Parent(s):
89b046e
Update README.md
Browse files
README.md
CHANGED
|
@@ -7,4 +7,48 @@ tags:
|
|
| 7 |
- sentiment-analysis
|
| 8 |
- transformers
|
| 9 |
- Transformateurs
|
| 10 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
- sentiment-analysis
|
| 8 |
- transformers
|
| 9 |
- Transformateurs
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
Sentiment Analysis Model for Hotel Reviews
|
| 13 |
+
This model performs sentiment analysis on hotel reviews. The goal is to classify reviews into one of the three categories: Negative, Neutral, or Positive.
|
| 14 |
+
|
| 15 |
+
Model Description
|
| 16 |
+
This model is based on the BERT (Bidirectional Encoder Representations from Transformers) model, specifically bert-base-uncased.
|
| 17 |
+
|
| 18 |
+
Training Procedure
|
| 19 |
+
The model was trained on the TripAdvisor hotel reviews dataset. Each review in the dataset is associated with a rating from 1 to 5. The ratings were converted to sentiment labels as follows:
|
| 20 |
+
|
| 21 |
+
Ratings of 1 and 2 were labelled as 'Negative'
|
| 22 |
+
Rating of 3 was labelled as 'Neutral'
|
| 23 |
+
Ratings of 4 and 5 were labelled as 'Positive'
|
| 24 |
+
The text of each review was preprocessed by lowercasing, removing punctuation, emojis, and stop words, and tokenized with the BERT tokenizer.
|
| 25 |
+
|
| 26 |
+
The model was trained with a learning rate of 2e-5, an epsilon of 1e-8, and a batch size of 6 for 5 epochs.
|
| 27 |
+
|
| 28 |
+
Evaluation
|
| 29 |
+
The model was evaluated using a weighted F1 score. The specific performance metrics obtained during evaluation will be updated here.
|
| 30 |
+
|
| 31 |
+
Usage
|
| 32 |
+
To use the model, load it and use it to classify a review. For example:
|
| 33 |
+
|
| 34 |
+
python
|
| 35 |
+
Copy code
|
| 36 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 37 |
+
|
| 38 |
+
tokenizer = AutoTokenizer.from_pretrained("<model-name>")
|
| 39 |
+
model = AutoModelForSequenceClassification.from_pretrained("<model-name>")
|
| 40 |
+
|
| 41 |
+
text = "The hotel was great and the staff were very friendly."
|
| 42 |
+
|
| 43 |
+
encoded_input = tokenizer(text, truncation=True, padding=True, return_tensors='pt')
|
| 44 |
+
output = model(**encoded_input)
|
| 45 |
+
predictions = output.logits.argmax(dim=1)
|
| 46 |
+
|
| 47 |
+
print(predictions)
|
| 48 |
+
Replace <model-name> with the actual model name.
|
| 49 |
+
|
| 50 |
+
Limitations and Bias
|
| 51 |
+
The model is trained on English data, so it might not perform well on reviews in other languages. Furthermore, it might be biased towards certain phrases or words that are commonly used in the training dataset.
|
| 52 |
+
|
| 53 |
+
Licensing
|
| 54 |
+
Please add licensing information here if applicable.
|