mlklt3
/

amazon-sentiment-roberta-base

Text Classification

sentiment-analysis

text-embeddings-inference

Model card Files Files and versions

amazon-sentiment-roberta-base / README.md

mlklt3's picture

Update README.md

5b57c9c verified 8 days ago

|

history blame contribute delete

3 kB

	---
	language: en
	license: apache-2.0
	library_name: transformers
	tags:
	- sentiment-analysis
	- roberta
	- amazon-reviews
	- e-commerce
	datasets:
	- amazon_fine_food_reviews
	metrics:
	- accuracy
	- f1
	pipeline_tag: text-classification
	---

	# Model Card: Amazon Sentiment RoBERTa Base

	## Model Description
	This model is a fine-tuned version of RoBERTa-base specifically optimized for sentiment analysis of customer reviews. It was trained on a balanced subset of the Amazon Fine Food Reviews dataset to classify text into three distinct categories: Negative, Neutral, and Positive.

	- Model Type: Transformer-based Text Classification
	- Language: English
	- Base Model: `roberta-base`

	## Intended Use
	- Primary Use Case: Real-time sentiment tracking for e-commerce platforms.
	- Scope: Analyzing short to medium-length customer feedback and product reviews.
	- Out-of-Scope: Not recommended for legal documents, medical advice, or languages other than English.

	## Training Data & Methodology
	### Dataset
	- Source: Amazon Fine Food Reviews (Kaggle).
	- Preprocessing: - Removal of duplicates and HTML tags.
	- POS-tag-based Lemmatization for linguistic normalization.
	- Undersampling to 15,000 samples (5,000 per class) to handle class imbalance.
	- Labels: - `0`: Negative (1-2 stars)
	- `1`: Neutral (3 stars)
	- `2`: Positive (4-5 stars)

	### Hyperparameters
	- Learning Rate: 2e-5
	- Batch Size: 16
	- Epochs: 2
	- Weight Decay: 0.01
	- Max Sequence Length: 128 tokens

	## Performance Metrics
	The model was evaluated on a held-out test set (20% of the balanced data):

	\| Metric \| Value \|
	\| :--- \| :--- \|
	\| Accuracy \| 78.0% \|
	\| Weighted F1-Score \| 0.78 \|
	\| Precision (Positive) \| 0.83 \|
	\| Recall (Positive) \| 0.89 \|

	### Key Strengths
	- Contextual Understanding: Successfully handles complex structures, such as negation and sarcasm (e.g., "Don't listen to the haters, this is great!").
	- Robustness: Significantly outperforms traditional TF-IDF and DistilBERT baselines in identifying ambiguous "Neutral" reviews.

	## Limitations & Bias
	- Neutral Class: Still remains the most frequent source of misclassification due to the inherent subjectivity of 3-star ratings.
	- Domain Specificity: Performance may vary when applied to domains outside of food and beverages (e.g., electronics or fashion).
	- Sarcasm: While improved, extremely subtle sarcasm may still lead to errors.

	## How to Use
	```python
	from transformers import pipeline

	# Load the model directly from the Hub
	model_path = "mlklt3/amazon-sentiment-roberta-base"
	sentiment_pipeline = pipeline("sentiment-analysis", model=model_path)

	# Example usage
	text = "The product was okay, but I expected much better flavor for this price."
	result = sentiment_pipeline(text)
	print(result)
	```
	## Citation
	If you use this model in your research or project, please credit the Amazon Fine Food Reviews dataset and the Hugging Face Transformers library.