--- language: - en - it - es - fr - de license: apache-2.0 library_name: transformers tags: - sentiment-analysis - text-classification - multilingual - restaurants - 5-star base_model: jhu-clsp/mmBERT-base pipeline_tag: text-classification --- # 🍜 Multilingual Restaurant Review Sentiment Model 🌍 Hey there! This isn't just _another_ sentiment model. This is a fine-tuned powerhouse specifically designed to understand the nuance of 1-to-5 star restaurant reviews across **5 different languages**. It was trained on a massive, perfectly balanced dataset of **400,000+ real, human-written, reviews** and achieves state-of-the-art performance. ## ✨ Model Features - **Multilingual:** Trained on **English**, **Italian**, **Spanish**, **French**, **German**. - **5-Star Specialist:** Predicts ratings on a 1-5 star scale. - **SOTA Performance:** Achieves an incredibly low **MAE of ~0.29**. (More on that below!) --- ## 🎯 Just How Good Is It? (Performance) Forget accuracy. For star ratings, **Mean Absolute Error (MAE)** is what matters. It measures how "off" the prediction is. What does that mean? It means on average, the model's prediction is **only off by 0.29 stars**. - It _knows_ a 5-star is close to a 4-star. - It _knows_ a 1-star is NOT a 5-star. - It **rarely** confuses a positive review for a negative one. Here are the full results from the validation set (500k real-world reviews!): | Metric | Score | Why it Matters | | :----------- | :-------- | :----------------------------------------------------------- | | **MAE** | **0.293** | πŸ† **The model main score.** | | **Accuracy** | 78.2% | How often the model guess the _exact_ star (after rounding). | | **Macro F1** | 0.683 | Shows it's good at all classes, not just the majority class. | | **MSE** | 0.182 | The loss the model was trained on (Mean Squared Error). | --- ### Confusion Matrix This shows where the model makes its errors. As you can see, almost all errors are "off-by-one" (like predicting a 4 for a 5-star), which is exactly what we want. | | **Predicted 1** | **Predicted 2** | **Predicted 3** | **Predicted 4** | **Predicted 5** | | :--------- | :-------------: | :-------------: | :-------------: | :-------------: | :-------------: | | **True 1** | 14683 | 8391 | 568 | 44 | 34 | | **True 2** | 2504 | 13699 | 4068 | 95 | 13 | | **True 3** | 290 | 6271 | 23824 | 5700 | 229 | | **True 4** | 18 | 267 | 6940 | 66361 | 25089 | | **True 5** | 44 | 143 | 553 | 47873 | 272298 | --- ### Performance Per Language The model performs strongly across all five languages. Here is the final accuracy for each language on the test set: | Region | Accuracy | | :-------- | :------- | | `English` | 0.827 | | `Italian` | 0.778 | | `Spanish` | 0.775 | | `French` | 0.763 | | `German` | 0.755 | --- ## 🧠 The "Regression Trick" (Why it's so good) Most models do "classification" (is it A, B, or C?). This is a bad fit for star ratings. This model was trained as a **regression** task. It predicts a single number (like 4.7, 1.2, or 3.5) instead of just "5-star". This teaches the model that 4-stars are "closer" to 5-stars than 1-star is, which is how it gets such a low MAE. --- ## πŸš€ How to Use Since this is a regression model, the output is a single float number. You'll want to round it to get a final "star" rating. ### ⚠️ A Critical Note on Input Format **This is very important for getting the best performance!** This model was not just trained on review text; it was trained using a specific format that includes **both the review title and the review text**, separated by the `[SEP]` token. The title often contains a powerful summary of the sentiment (e.g., "Best Pasta Ever!" or "Total Rip-off!"). Using this format ensures the model gets the same type of input it was trained on. **Correct Format:** `input_text = review_title + " [SEP] " + review_text` If you only have the review text, the model will still work well, but performance will be slightly lower. ### Pipeline Usage Example Here is how you should format your inputs before passing them to the pipeline: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline import numpy as np # Make sure to import numpy model_name = "Festooned/Multilingual-Restaurant-Reviews-Sentiment" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # --- # IMPORTANT: This model predicts a single number (regression). # --- # Let's create a pipeline sentiment_pipe = pipeline("text-classification", model=model, tokenizer=tokenizer) # Example reviews using the recommended format reviews = [ "Absolutely incredible [SEP] This was the best pasta I've ever had in my life.", # 5-star "Servicio terrible [SEP] El servicio fue terrible y la comida tardΓ³ una hora en llegar.", # 1-star "It was fine [SEP] It was... fine. Nothing special, but not bad either." # 3-star ] # Get the raw predictions raw_preds = sentiment_pipe(reviews) print(raw_preds) # [{'label': 'LABEL_0', 'score': 4.81}] # [{'label': 'LABEL_0', 'score': 1.12}] # [{'label': 'LABEL_0', 'score': 2.95}] # --- # How to get the actual "star rating" # (Remember our labels are 0-4, so we add 1) # --- for text, pred in zip(reviews, raw_preds): # 'score' is the raw regression value (our model predicts 0-4) raw_score = pred['score'] # Round and clamp to be safe (0-4) star_label_rounded = np.clip(round(raw_score), 0, 4) # Add 1 to get the 1-5 star rating final_star_rating = int(star_label_rounded + 1) print(f"Review: {text[:40]}...") print(f" Final Rating: {final_star_rating} stars\n") ``` --- ## πŸ’‘ Bonus: Convert to 3 Classes (Bad/Neutral/Good) This 5-star model is flexible! If you don't need 5 classes, you can easily group the results. Here's a simple helper function to convert the 1-5 star rating into **Bad**, **Neutral**, or **Good**. ```python def to_3_class(rating): """Converts a 1-5 star rating into a 3-class sentiment.""" # The 'rating' is the rounded 1-5 star value if rating <= 2: return "😞 Bad" elif rating == 3: return "😐 Neutral" else: # 4 or 5 stars return "πŸ˜„ Good" # Example using the rounded rating from the code above: # Let's say a review got a rounded rating of 1 rating_1 = 1 print(f"Rating {rating_1} is: {to_3_class(rating_1)}") # Let's say a review got a rounded rating of 3 rating_3 = 3 print(f"Rating {rating_3} is: {to_3_class(rating_3)}") # Let's say a review got a rounded rating of 5 rating_5 = 5 print(f"Rating {rating_5} is: {to_3_class(rating_5)}") Output: Rating 1 is: 😞 Bad Rating 3 is: 😐 Neutral Rating 5 is: πŸ˜„ Good ``` --- ## πŸ§ͺ Bonus: A Test of Specialization (Domain Shift) This model is a SOTA-level _restaurant_ critic. But what happens if it's asked to review a car mechanic or a hair salon? To find out, the model was tested on the **`yelp_review_full`** dataset. This dataset is **not** just restaurants; it includes reviews for auto shops, plumbers, gyms, salons, and all other business types. The results are exactly what would be expected from a highly trained specialist: | Metric | Score on Restaurant-Only Data | Score on `yelp_review_full` (All businesses) | | :----------- | :---------------------------: | :------------------------------------------: | | **MAE** | **0.2928** | 0.4648 | | **Accuracy** | **78.2%** | 62.7% | --- ## Citation If you use this model in your research or app, please give it a shout-out! ```bibtex @misc{adobati-2025-multilingual-restaurant, author = {Simone Adobati}, title = {A Multilingual 5-Class Restaurant Review Sentiment Model}, year = {2025}, publisher = {Hugging Face}, journal = {Hugging Face Model Hub}, howpublished = {\url{[https://huggingface.co/](https://huggingface.co/)[Festooned/Multilingual-Restaurant-Reviews-Sentiment]}} } ```