AventIQ-AI
/

XLMRoBERTa_Multilingual_Sentiment_Analysis

Model card Files Files and versions

XLMRoBERTa_Multilingual_Sentiment_Analysis / README.md

Aryan7500's picture

Upload 6 files

eda255d verified 8 months ago

|

history blame contribute delete

3.51 kB


	# XLM-RoBERTa-Base Multilingual Model for Sentiment Analysis on Amazon Reviews

	This repository contains a multilingual sentiment analysis model fine-tuned on the [Amazon Reviews Multi](https://huggingface.co/datasets/amazon_reviews_multi) dataset using the `xlm-roberta-base` architecture from Hugging Face Transformers. The model is capable of analyzing product review sentiment in multiple languages and is suitable for real-world multilingual applications.

	---

	## Model Details

	- Model Architecture: XLM-RoBERTa Base
	- Task: Sentiment Classification (Binary: Positive / Negative)
	- Dataset: Amazon Reviews Multi (`en` subset used for fine-tuning)
	- Languages Supported: Trained on English, generalizes to multilingual due to XLM-R architecture
	- Fine-tuning Framework: Hugging Face Transformers

	---

	## Usage

	### Installation

	```bash
	pip install transformers torch
	```

	### Loading and Testing the Model

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load model and tokenizer
	model_path = "your-username/xlm-roberta-sentiment-amazon-reviews"
	model = AutoModelForSequenceClassification.from_pretrained(model_path)
	tokenizer = AutoTokenizer.from_pretrained(model_path)
	model.eval()

	# Prediction function
	def predict_sentiment(texts):
	inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt").to(model.device)
	with torch.no_grad():
	outputs = model(**inputs)
	probs = torch.softmax(outputs.logits, dim=1)
	preds = torch.argmax(probs, dim=1)

	label_map = {0: "Negative", 1: "Positive"}
	results = []
	for text, pred, prob in zip(texts, preds, probs):
	results.append({
	"text": text,
	"prediction": label_map[pred.item()],
	"confidence": round(prob[pred].item(), 4)
	})
	return results

	# Example
	examples = ["This product is amazing!", "Worst purchase ever."]
	print(predict_sentiment(examples))
	```

	---

	## Performance Metrics

	\| Epoch \| Training Loss \| Validation Loss \| Accuracy \| F1 Macro \|
	\|-------\|----------------\|------------------\|----------\|-----------\|
	\| 1 \| 0.1987 \| 0.1842 \| 93.22% \| 0.9321 \|
	\| 2 \| 0.1472 \| 0.1987 \| 93.46% \| 0.9346 \|
	\| 3 \| 0.0960 \| 0.2491 \| 93.42% \| 0.9341 \|

	---

	## Fine-Tuning Details

	### Dataset

	- Source: [Amazon Reviews Multi](https://huggingface.co/datasets/amazon_reviews_multi)
	- Labels: Originally 5 classes; remapped to binary sentiment (0 = Negative [1–2 stars], 1 = Positive [4–5 stars])
	- Neutral (3 stars) were excluded from training

	### Training Configuration

	- Epochs: 3
	- Batch size: 16
	- Learning rate: 2e-5
	- Optimizer: AdamW
	- Evaluation strategy: Epoch-based

	---

	## Repository Structure

	```
	.
	├── model/ # Fine-tuned model and config files
	├── tokenizer/ # Tokenizer files
	├── inference.py # Inference and testing script
	├── README.md # Model documentation
	```

	---

	## Limitations

	- Trained only on the English subset of Amazon Reviews Multi; multilingual performance may vary.
	- Neutral reviews (3-star) are excluded, so the model may not detect nuanced sentiment.
	- Fine-tuning was not domain-specific, so performance may degrade in highly specialized review categories.

	---

	## Contributing

	Contributions are welcome! Feel free to open an issue or pull request for improvements or bug fixes.