| # XLM-RoBERTa-Base Multilingual Model for Sentiment Analysis on Amazon Reviews | |
| This repository contains a multilingual sentiment analysis model fine-tuned on the [Amazon Reviews Multi](https://huggingface.co/datasets/amazon_reviews_multi) dataset using the `xlm-roberta-base` architecture from Hugging Face Transformers. The model is capable of analyzing product review sentiment in multiple languages and is suitable for real-world multilingual applications. | |
| --- | |
| ## Model Details | |
| - **Model Architecture:** XLM-RoBERTa Base | |
| - **Task:** Sentiment Classification (Binary: Positive / Negative) | |
| - **Dataset:** Amazon Reviews Multi (`en` subset used for fine-tuning) | |
| - **Languages Supported:** Trained on English, generalizes to multilingual due to XLM-R architecture | |
| - **Fine-tuning Framework:** Hugging Face Transformers | |
| --- | |
| ## Usage | |
| ### Installation | |
| ```bash | |
| pip install transformers torch | |
| ``` | |
| ### Loading and Testing the Model | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| import torch | |
| # Load model and tokenizer | |
| model_path = "your-username/xlm-roberta-sentiment-amazon-reviews" | |
| model = AutoModelForSequenceClassification.from_pretrained(model_path) | |
| tokenizer = AutoTokenizer.from_pretrained(model_path) | |
| model.eval() | |
| # Prediction function | |
| def predict_sentiment(texts): | |
| inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt").to(model.device) | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| probs = torch.softmax(outputs.logits, dim=1) | |
| preds = torch.argmax(probs, dim=1) | |
| label_map = {0: "Negative", 1: "Positive"} | |
| results = [] | |
| for text, pred, prob in zip(texts, preds, probs): | |
| results.append({ | |
| "text": text, | |
| "prediction": label_map[pred.item()], | |
| "confidence": round(prob[pred].item(), 4) | |
| }) | |
| return results | |
| # Example | |
| examples = ["This product is amazing!", "Worst purchase ever."] | |
| print(predict_sentiment(examples)) | |
| ``` | |
| --- | |
| ## Performance Metrics | |
| | Epoch | Training Loss | Validation Loss | Accuracy | F1 Macro | | |
| |-------|----------------|------------------|----------|-----------| | |
| | 1 | 0.1987 | 0.1842 | 93.22% | 0.9321 | | |
| | 2 | 0.1472 | 0.1987 | 93.46% | 0.9346 | | |
| | 3 | 0.0960 | 0.2491 | 93.42% | 0.9341 | | |
| --- | |
| ## Fine-Tuning Details | |
| ### Dataset | |
| - Source: [Amazon Reviews Multi](https://huggingface.co/datasets/amazon_reviews_multi) | |
| - Labels: Originally 5 classes; remapped to binary sentiment (0 = Negative [1β2 stars], 1 = Positive [4β5 stars]) | |
| - Neutral (3 stars) were excluded from training | |
| ### Training Configuration | |
| - Epochs: 3 | |
| - Batch size: 16 | |
| - Learning rate: 2e-5 | |
| - Optimizer: AdamW | |
| - Evaluation strategy: Epoch-based | |
| --- | |
| ## Repository Structure | |
| ``` | |
| . | |
| βββ model/ # Fine-tuned model and config files | |
| βββ tokenizer/ # Tokenizer files | |
| βββ inference.py # Inference and testing script | |
| βββ README.md # Model documentation | |
| ``` | |
| --- | |
| ## Limitations | |
| - Trained only on the English subset of Amazon Reviews Multi; multilingual performance may vary. | |
| - Neutral reviews (3-star) are excluded, so the model may not detect nuanced sentiment. | |
| - Fine-tuning was not domain-specific, so performance may degrade in highly specialized review categories. | |
| --- | |
| ## Contributing | |
| Contributions are welcome! Feel free to open an issue or pull request for improvements or bug fixes. | |