| widget: | |
| - text: "Is this review positive or negative? Review: Best cast iron skillet you will ever buy." | |
| example_title: "Sentiment analysis" | |
| --- | |
| language: | |
| - en | |
| tags: | |
| - sentiment | |
| - bert | |
| - sentiment-analysis | |
| - transformers | |
| - Transformateurs | |
| --- | |
| Sentiment Analysis Model for Hotel Reviews | |
| This model performs sentiment analysis on hotel reviews. The goal is to classify reviews into one of the three categories: Negative, Neutral, or Positive. | |
| Model Description | |
| This model is based on the BERT (Bidirectional Encoder Representations from Transformers) model, specifically bert-base-uncased. | |
| Training Procedure | |
| The model was trained on the TripAdvisor hotel reviews dataset. Each review in the dataset is associated with a rating from 1 to 5. | |
| The ratings were converted to sentiment labels as follows: | |
| Ratings of 1 and 2 were labelled as 'Negative' | |
| Rating of 3 was labelled as 'Neutral' | |
| Ratings of 4 and 5 were labelled as 'Positive' | |
| The text of each review was preprocessed by lowercasing, removing punctuation, emojis, and stop words, and tokenized with the BERT tokenizer. | |
| The model was trained with a learning rate of 2e-5, an epsilon of 1e-8, and a batch size of 6 for 5 epochs. | |
| Evaluation | |
| The model was evaluated using a weighted F1 score. | |
| Usage | |
| To use the model, load it and use it to classify a review. For example: | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| tokenizer = AutoTokenizer.from_pretrained("<Group209>") | |
| model = AutoModelForSequenceClassification.from_pretrained("<Group209>") | |
| text = "The hotel was great and the staff were very friendly." | |
| encoded_input = tokenizer(text, truncation=True, padding=True, return_tensors='pt') | |
| output = model(**encoded_input) | |
| predictions = output.logits.argmax(dim=1) | |
| print(predictions) | |
| Limitations and Bias | |
| The model is trained on English data, so it might not perform well on reviews in other languages. | |
| Furthermore, it might be biased towards certain phrases or words that are commonly used in the training dataset. | |