Initial upload of Multilingual-Restaurant-Reviews-Sentiment

5596981 verified 3 months ago

8.52 kB

	---
	language:
	- en
	- it
	- es
	- fr
	- de
	license: apache-2.0
	library_name: transformers
	tags:
	- sentiment-analysis
	- text-classification
	- multilingual
	- restaurants
	- 5-star
	base_model: jhu-clsp/mmBERT-base
	pipeline_tag: text-classification
	---

	# 🍜 Multilingual Restaurant Review Sentiment Model 🌍

	Hey there! This isn't just _another_ sentiment model. This is a fine-tuned powerhouse specifically designed to understand the nuance of 1-to-5 star restaurant reviews across 5 different languages.

	It was trained on a massive, perfectly balanced dataset of 400,000+ real, human-written, reviews and achieves state-of-the-art performance.

	## ✨ Model Features

	- Multilingual: Trained on English, Italian, Spanish, French, German.
	- 5-Star Specialist: Predicts ratings on a 1-5 star scale.
	- SOTA Performance: Achieves an incredibly low MAE of ~0.29. (More on that below!)

	---

	## 🎯 Just How Good Is It? (Performance)

	Forget accuracy. For star ratings, Mean Absolute Error (MAE) is what matters. It measures how "off" the prediction is.

	What does that mean? It means on average, the model's prediction is only off by 0.29 stars.

	- It _knows_ a 5-star is close to a 4-star.
	- It _knows_ a 1-star is NOT a 5-star.
	- It rarely confuses a positive review for a negative one.

	Here are the full results from the validation set (500k real-world reviews!):

	\| Metric \| Score \| Why it Matters \|
	\| :----------- \| :-------- \| :----------------------------------------------------------- \|
	\| MAE \| 0.293 \| 🏆 The model main score. \|
	\| Accuracy \| 78.2% \| How often the model guess the _exact_ star (after rounding). \|
	\| Macro F1 \| 0.683 \| Shows it's good at all classes, not just the majority class. \|
	\| MSE \| 0.182 \| The loss the model was trained on (Mean Squared Error). \|

	---

	### Confusion Matrix

	This shows where the model makes its errors. As you can see, almost all errors are "off-by-one" (like predicting a 4 for a 5-star), which is exactly what we want.

	\| \| Predicted 1 \| Predicted 2 \| Predicted 3 \| Predicted 4 \| Predicted 5 \|
	\| :--------- \| :-------------: \| :-------------: \| :-------------: \| :-------------: \| :-------------: \|
	\| True 1 \| 14683 \| 8391 \| 568 \| 44 \| 34 \|
	\| True 2 \| 2504 \| 13699 \| 4068 \| 95 \| 13 \|
	\| True 3 \| 290 \| 6271 \| 23824 \| 5700 \| 229 \|
	\| True 4 \| 18 \| 267 \| 6940 \| 66361 \| 25089 \|
	\| True 5 \| 44 \| 143 \| 553 \| 47873 \| 272298 \|

	---

	### Performance Per Language

	The model performs strongly across all five languages. Here is the final accuracy for each language on the test set:

	\| Region \| Accuracy \|
	\| :-------- \| :------- \|
	\| `English` \| 0.827 \|
	\| `Italian` \| 0.778 \|
	\| `Spanish` \| 0.775 \|
	\| `French` \| 0.763 \|
	\| `German` \| 0.755 \|

	---

	## 🧠 The "Regression Trick" (Why it's so good)

	Most models do "classification" (is it A, B, or C?). This is a bad fit for star ratings.

	This model was trained as a regression task. It predicts a single number (like 4.7, 1.2, or 3.5) instead of just "5-star". This teaches the model that 4-stars are "closer" to 5-stars than 1-star is, which is how it gets such a low MAE.

	---

	## 🚀 How to Use

	Since this is a regression model, the output is a single float number. You'll want to round it to get a final "star" rating.

	### ⚠️ A Critical Note on Input Format

	This is very important for getting the best performance!

	This model was not just trained on review text; it was trained using a specific format that includes both the review title and the review text, separated by the `[SEP]` token.

	The title often contains a powerful summary of the sentiment (e.g., "Best Pasta Ever!" or "Total Rip-off!"). Using this format ensures the model gets the same type of input it was trained on.

	Correct Format:
	`input_text = review_title + " [SEP] " + review_text`

	If you only have the review text, the model will still work well, but performance will be slightly lower.

	### Pipeline Usage Example

	Here is how you should format your inputs before passing them to the pipeline:

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
	import numpy as np # Make sure to import numpy

	model_name = "Festooned/Multilingual-Restaurant-Reviews-Sentiment"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# ---
	# IMPORTANT: This model predicts a single number (regression).
	# ---

	# Let's create a pipeline
	sentiment_pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

	# Example reviews using the recommended format
	reviews = [
	"Absolutely incredible [SEP] This was the best pasta I've ever had in my life.", # 5-star
	"Servicio terrible [SEP] El servicio fue terrible y la comida tardó una hora en llegar.", # 1-star
	"It was fine [SEP] It was... fine. Nothing special, but not bad either." # 3-star
	]

	# Get the raw predictions
	raw_preds = sentiment_pipe(reviews)
	print(raw_preds)
	# [{'label': 'LABEL_0', 'score': 4.81}]
	# [{'label': 'LABEL_0', 'score': 1.12}]
	# [{'label': 'LABEL_0', 'score': 2.95}]

	# ---
	# How to get the actual "star rating"
	# (Remember our labels are 0-4, so we add 1)
	# ---
	for text, pred in zip(reviews, raw_preds):
	# 'score' is the raw regression value (our model predicts 0-4)
	raw_score = pred['score']

	# Round and clamp to be safe (0-4)
	star_label_rounded = np.clip(round(raw_score), 0, 4)

	# Add 1 to get the 1-5 star rating
	final_star_rating = int(star_label_rounded + 1)

	print(f"Review: {text[:40]}...")
	print(f" Final Rating: {final_star_rating} stars\n")
	```

	---

	## 💡 Bonus: Convert to 3 Classes (Bad/Neutral/Good)

	This 5-star model is flexible! If you don't need 5 classes, you can easily group the results.

	Here's a simple helper function to convert the 1-5 star rating into Bad, Neutral, or Good.

	```python
	def to_3_class(rating):
	"""Converts a 1-5 star rating into a 3-class sentiment."""
	# The 'rating' is the rounded 1-5 star value
	if rating <= 2:
	return "😞 Bad"
	elif rating == 3:
	return "😐 Neutral"
	else: # 4 or 5 stars
	return "😄 Good"

	# Example using the rounded rating from the code above:
	# Let's say a review got a rounded rating of 1
	rating_1 = 1
	print(f"Rating {rating_1} is: {to_3_class(rating_1)}")

	# Let's say a review got a rounded rating of 3
	rating_3 = 3
	print(f"Rating {rating_3} is: {to_3_class(rating_3)}")

	# Let's say a review got a rounded rating of 5
	rating_5 = 5
	print(f"Rating {rating_5} is: {to_3_class(rating_5)}")

	Output:
	Rating 1 is: 😞 Bad
	Rating 3 is: 😐 Neutral
	Rating 5 is: 😄 Good
	```

	---

	## 🧪 Bonus: A Test of Specialization (Domain Shift)

	This model is a SOTA-level _restaurant_ critic. But what happens if it's asked to review a car mechanic or a hair salon?

	To find out, the model was tested on the `yelp_review_full` dataset. This dataset is not just restaurants; it includes reviews for auto shops, plumbers, gyms, salons, and all other business types.

	The results are exactly what would be expected from a highly trained specialist:

	\| Metric \| Score on Restaurant-Only Data \| Score on `yelp_review_full` (All businesses) \|
	\| :----------- \| :---------------------------: \| :------------------------------------------: \|
	\| MAE \| 0.2928 \| 0.4648 \|
	\| Accuracy \| 78.2% \| 62.7% \|

	---

	## Citation

	If you use this model in your research or app, please give it a shout-out!

	```bibtex
	@misc{adobati-2025-multilingual-restaurant,
	author = {Simone Adobati},
	title = {A Multilingual 5-Class Restaurant Review Sentiment Model},
	year = {2025},
	publisher = {Hugging Face},
	journal = {Hugging Face Model Hub},
	howpublished = {\url{[https://huggingface.co/](https://huggingface.co/)[Festooned/Multilingual-Restaurant-Reviews-Sentiment]}}
	}
	```