File size: 3,000 Bytes
b6be860
5b57c9c
 
b6be860
5b57c9c
 
 
 
 
 
 
 
 
 
 
b6be860
 
5b57c9c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
language: en
license: apache-2.0
library_name: transformers
tags:
- sentiment-analysis
- roberta
- amazon-reviews
- e-commerce
datasets:
- amazon_fine_food_reviews
metrics:
- accuracy
- f1
pipeline_tag: text-classification
---

# Model Card: Amazon Sentiment RoBERTa Base

## Model Description
This model is a fine-tuned version of **RoBERTa-base** specifically optimized for sentiment analysis of customer reviews. It was trained on a balanced subset of the Amazon Fine Food Reviews dataset to classify text into three distinct categories: **Negative**, **Neutral**, and **Positive**.

- **Model Type:** Transformer-based Text Classification
- **Language:** English
- **Base Model:** `roberta-base`

## Intended Use
- **Primary Use Case:** Real-time sentiment tracking for e-commerce platforms.
- **Scope:** Analyzing short to medium-length customer feedback and product reviews.
- **Out-of-Scope:** Not recommended for legal documents, medical advice, or languages other than English.

## Training Data & Methodology
### Dataset
- **Source:** Amazon Fine Food Reviews (Kaggle).
- **Preprocessing:** - Removal of duplicates and HTML tags.
  - POS-tag-based Lemmatization for linguistic normalization.
  - Undersampling to 15,000 samples (5,000 per class) to handle class imbalance.
- **Labels:** - `0`: Negative (1-2 stars)
  - `1`: Neutral (3 stars)
  - `2`: Positive (4-5 stars)

### Hyperparameters
- **Learning Rate:** 2e-5
- **Batch Size:** 16
- **Epochs:** 2
- **Weight Decay:** 0.01
- **Max Sequence Length:** 128 tokens

## Performance Metrics
The model was evaluated on a held-out test set (20% of the balanced data):

| Metric | Value |
| :--- | :--- |
| **Accuracy** | 78.0% |
| **Weighted F1-Score** | 0.78 |
| **Precision (Positive)** | 0.83 |
| **Recall (Positive)** | 0.89 |

### Key Strengths
- **Contextual Understanding:** Successfully handles complex structures, such as negation and sarcasm (e.g., "Don't listen to the haters, this is great!").
- **Robustness:** Significantly outperforms traditional TF-IDF and DistilBERT baselines in identifying ambiguous "Neutral" reviews.

## Limitations & Bias
- **Neutral Class:** Still remains the most frequent source of misclassification due to the inherent subjectivity of 3-star ratings.
- **Domain Specificity:** Performance may vary when applied to domains outside of food and beverages (e.g., electronics or fashion).
- **Sarcasm:** While improved, extremely subtle sarcasm may still lead to errors.

## How to Use
```python
from transformers import pipeline

# Load the model directly from the Hub
model_path = "mlklt3/amazon-sentiment-roberta-base"
sentiment_pipeline = pipeline("sentiment-analysis", model=model_path)

# Example usage
text = "The product was okay, but I expected much better flavor for this price."
result = sentiment_pipeline(text)
print(result)
```
## Citation
If you use this model in your research or project, please credit the Amazon Fine Food Reviews dataset and the Hugging Face Transformers library.