|
|
--- |
|
|
language: |
|
|
- en |
|
|
- it |
|
|
- es |
|
|
- fr |
|
|
- de |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
tags: |
|
|
- sentiment-analysis |
|
|
- text-classification |
|
|
- multilingual |
|
|
- restaurants |
|
|
- 5-star |
|
|
base_model: jhu-clsp/mmBERT-base |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# ๐ Multilingual Restaurant Review Sentiment Model ๐ |
|
|
|
|
|
Hey there! This isn't just _another_ sentiment model. This is a fine-tuned powerhouse specifically designed to understand the nuance of 1-to-5 star restaurant reviews across **5 different languages**. |
|
|
|
|
|
It was trained on a massive, perfectly balanced dataset of **400,000+ real, human-written, reviews** and achieves state-of-the-art performance. |
|
|
|
|
|
## โจ Model Features |
|
|
|
|
|
- **Multilingual:** Trained on **English**, **Italian**, **Spanish**, **French**, **German**. |
|
|
- **5-Star Specialist:** Predicts ratings on a 1-5 star scale. |
|
|
- **SOTA Performance:** Achieves an incredibly low **MAE of ~0.29**. (More on that below!) |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ฏ Just How Good Is It? (Performance) |
|
|
|
|
|
Forget accuracy. For star ratings, **Mean Absolute Error (MAE)** is what matters. It measures how "off" the prediction is. |
|
|
|
|
|
What does that mean? It means on average, the model's prediction is **only off by 0.29 stars**. |
|
|
|
|
|
- It _knows_ a 5-star is close to a 4-star. |
|
|
- It _knows_ a 1-star is NOT a 5-star. |
|
|
- It **rarely** confuses a positive review for a negative one. |
|
|
|
|
|
Here are the full results from the validation set (500k real-world reviews!): |
|
|
|
|
|
| Metric | Score | Why it Matters | |
|
|
| :----------- | :-------- | :----------------------------------------------------------- | |
|
|
| **MAE** | **0.293** | ๐ **The model main score.** | |
|
|
| **Accuracy** | 78.2% | How often the model guess the _exact_ star (after rounding). | |
|
|
| **Macro F1** | 0.683 | Shows it's good at all classes, not just the majority class. | |
|
|
| **MSE** | 0.182 | The loss the model was trained on (Mean Squared Error). | |
|
|
|
|
|
--- |
|
|
|
|
|
### Confusion Matrix |
|
|
|
|
|
This shows where the model makes its errors. As you can see, almost all errors are "off-by-one" (like predicting a 4 for a 5-star), which is exactly what we want. |
|
|
|
|
|
| | **Predicted 1** | **Predicted 2** | **Predicted 3** | **Predicted 4** | **Predicted 5** | |
|
|
| :--------- | :-------------: | :-------------: | :-------------: | :-------------: | :-------------: | |
|
|
| **True 1** | 14683 | 8391 | 568 | 44 | 34 | |
|
|
| **True 2** | 2504 | 13699 | 4068 | 95 | 13 | |
|
|
| **True 3** | 290 | 6271 | 23824 | 5700 | 229 | |
|
|
| **True 4** | 18 | 267 | 6940 | 66361 | 25089 | |
|
|
| **True 5** | 44 | 143 | 553 | 47873 | 272298 | |
|
|
|
|
|
--- |
|
|
|
|
|
### Performance Per Language |
|
|
|
|
|
The model performs strongly across all five languages. Here is the final accuracy for each language on the test set: |
|
|
|
|
|
| Region | Accuracy | |
|
|
| :-------- | :------- | |
|
|
| `English` | 0.827 | |
|
|
| `Italian` | 0.778 | |
|
|
| `Spanish` | 0.775 | |
|
|
| `French` | 0.763 | |
|
|
| `German` | 0.755 | |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ง The "Regression Trick" (Why it's so good) |
|
|
|
|
|
Most models do "classification" (is it A, B, or C?). This is a bad fit for star ratings. |
|
|
|
|
|
This model was trained as a **regression** task. It predicts a single number (like 4.7, 1.2, or 3.5) instead of just "5-star". This teaches the model that 4-stars are "closer" to 5-stars than 1-star is, which is how it gets such a low MAE. |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ How to Use |
|
|
|
|
|
Since this is a regression model, the output is a single float number. You'll want to round it to get a final "star" rating. |
|
|
|
|
|
### โ ๏ธ A Critical Note on Input Format |
|
|
|
|
|
**This is very important for getting the best performance!** |
|
|
|
|
|
This model was not just trained on review text; it was trained using a specific format that includes **both the review title and the review text**, separated by the `[SEP]` token. |
|
|
|
|
|
The title often contains a powerful summary of the sentiment (e.g., "Best Pasta Ever!" or "Total Rip-off!"). Using this format ensures the model gets the same type of input it was trained on. |
|
|
|
|
|
**Correct Format:** |
|
|
`input_text = review_title + " [SEP] " + review_text` |
|
|
|
|
|
If you only have the review text, the model will still work well, but performance will be slightly lower. |
|
|
|
|
|
### Pipeline Usage Example |
|
|
|
|
|
Here is how you should format your inputs before passing them to the pipeline: |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
|
|
import numpy as np # Make sure to import numpy |
|
|
|
|
|
model_name = "Festooned/Multilingual-Restaurant-Reviews-Sentiment" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
# --- |
|
|
# IMPORTANT: This model predicts a single number (regression). |
|
|
# --- |
|
|
|
|
|
# Let's create a pipeline |
|
|
sentiment_pipe = pipeline("text-classification", model=model, tokenizer=tokenizer) |
|
|
|
|
|
# Example reviews using the recommended format |
|
|
reviews = [ |
|
|
"Absolutely incredible [SEP] This was the best pasta I've ever had in my life.", # 5-star |
|
|
"Servicio terrible [SEP] El servicio fue terrible y la comida tardรณ una hora en llegar.", # 1-star |
|
|
"It was fine [SEP] It was... fine. Nothing special, but not bad either." # 3-star |
|
|
] |
|
|
|
|
|
# Get the raw predictions |
|
|
raw_preds = sentiment_pipe(reviews) |
|
|
print(raw_preds) |
|
|
# [{'label': 'LABEL_0', 'score': 4.81}] |
|
|
# [{'label': 'LABEL_0', 'score': 1.12}] |
|
|
# [{'label': 'LABEL_0', 'score': 2.95}] |
|
|
|
|
|
# --- |
|
|
# How to get the actual "star rating" |
|
|
# (Remember our labels are 0-4, so we add 1) |
|
|
# --- |
|
|
for text, pred in zip(reviews, raw_preds): |
|
|
# 'score' is the raw regression value (our model predicts 0-4) |
|
|
raw_score = pred['score'] |
|
|
|
|
|
# Round and clamp to be safe (0-4) |
|
|
star_label_rounded = np.clip(round(raw_score), 0, 4) |
|
|
|
|
|
# Add 1 to get the 1-5 star rating |
|
|
final_star_rating = int(star_label_rounded + 1) |
|
|
|
|
|
print(f"Review: {text[:40]}...") |
|
|
print(f" Final Rating: {final_star_rating} stars\n") |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ก Bonus: Convert to 3 Classes (Bad/Neutral/Good) |
|
|
|
|
|
This 5-star model is flexible! If you don't need 5 classes, you can easily group the results. |
|
|
|
|
|
Here's a simple helper function to convert the 1-5 star rating into **Bad**, **Neutral**, or **Good**. |
|
|
|
|
|
```python |
|
|
def to_3_class(rating): |
|
|
"""Converts a 1-5 star rating into a 3-class sentiment.""" |
|
|
# The 'rating' is the rounded 1-5 star value |
|
|
if rating <= 2: |
|
|
return "๐ Bad" |
|
|
elif rating == 3: |
|
|
return "๐ Neutral" |
|
|
else: # 4 or 5 stars |
|
|
return "๐ Good" |
|
|
|
|
|
# Example using the rounded rating from the code above: |
|
|
# Let's say a review got a rounded rating of 1 |
|
|
rating_1 = 1 |
|
|
print(f"Rating {rating_1} is: {to_3_class(rating_1)}") |
|
|
|
|
|
# Let's say a review got a rounded rating of 3 |
|
|
rating_3 = 3 |
|
|
print(f"Rating {rating_3} is: {to_3_class(rating_3)}") |
|
|
|
|
|
# Let's say a review got a rounded rating of 5 |
|
|
rating_5 = 5 |
|
|
print(f"Rating {rating_5} is: {to_3_class(rating_5)}") |
|
|
|
|
|
Output: |
|
|
Rating 1 is: ๐ Bad |
|
|
Rating 3 is: ๐ Neutral |
|
|
Rating 5 is: ๐ Good |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐งช Bonus: A Test of Specialization (Domain Shift) |
|
|
|
|
|
This model is a SOTA-level _restaurant_ critic. But what happens if it's asked to review a car mechanic or a hair salon? |
|
|
|
|
|
To find out, the model was tested on the **`yelp_review_full`** dataset. This dataset is **not** just restaurants; it includes reviews for auto shops, plumbers, gyms, salons, and all other business types. |
|
|
|
|
|
The results are exactly what would be expected from a highly trained specialist: |
|
|
|
|
|
| Metric | Score on Restaurant-Only Data | Score on `yelp_review_full` (All businesses) | |
|
|
| :----------- | :---------------------------: | :------------------------------------------: | |
|
|
| **MAE** | **0.2928** | 0.4648 | |
|
|
| **Accuracy** | **78.2%** | 62.7% | |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research or app, please give it a shout-out! |
|
|
|
|
|
```bibtex |
|
|
@misc{adobati-2025-multilingual-restaurant, |
|
|
author = {Simone Adobati}, |
|
|
title = {A Multilingual 5-Class Restaurant Review Sentiment Model}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
journal = {Hugging Face Model Hub}, |
|
|
howpublished = {\url{[https://huggingface.co/](https://huggingface.co/)[Festooned/Multilingual-Restaurant-Reviews-Sentiment]}} |
|
|
} |
|
|
``` |
|
|
|