mlklt3's picture
Update README.md
5b57c9c verified
---
language: en
license: apache-2.0
library_name: transformers
tags:
- sentiment-analysis
- roberta
- amazon-reviews
- e-commerce
datasets:
- amazon_fine_food_reviews
metrics:
- accuracy
- f1
pipeline_tag: text-classification
---
# Model Card: Amazon Sentiment RoBERTa Base
## Model Description
This model is a fine-tuned version of **RoBERTa-base** specifically optimized for sentiment analysis of customer reviews. It was trained on a balanced subset of the Amazon Fine Food Reviews dataset to classify text into three distinct categories: **Negative**, **Neutral**, and **Positive**.
- **Model Type:** Transformer-based Text Classification
- **Language:** English
- **Base Model:** `roberta-base`
## Intended Use
- **Primary Use Case:** Real-time sentiment tracking for e-commerce platforms.
- **Scope:** Analyzing short to medium-length customer feedback and product reviews.
- **Out-of-Scope:** Not recommended for legal documents, medical advice, or languages other than English.
## Training Data & Methodology
### Dataset
- **Source:** Amazon Fine Food Reviews (Kaggle).
- **Preprocessing:** - Removal of duplicates and HTML tags.
- POS-tag-based Lemmatization for linguistic normalization.
- Undersampling to 15,000 samples (5,000 per class) to handle class imbalance.
- **Labels:** - `0`: Negative (1-2 stars)
- `1`: Neutral (3 stars)
- `2`: Positive (4-5 stars)
### Hyperparameters
- **Learning Rate:** 2e-5
- **Batch Size:** 16
- **Epochs:** 2
- **Weight Decay:** 0.01
- **Max Sequence Length:** 128 tokens
## Performance Metrics
The model was evaluated on a held-out test set (20% of the balanced data):
| Metric | Value |
| :--- | :--- |
| **Accuracy** | 78.0% |
| **Weighted F1-Score** | 0.78 |
| **Precision (Positive)** | 0.83 |
| **Recall (Positive)** | 0.89 |
### Key Strengths
- **Contextual Understanding:** Successfully handles complex structures, such as negation and sarcasm (e.g., "Don't listen to the haters, this is great!").
- **Robustness:** Significantly outperforms traditional TF-IDF and DistilBERT baselines in identifying ambiguous "Neutral" reviews.
## Limitations & Bias
- **Neutral Class:** Still remains the most frequent source of misclassification due to the inherent subjectivity of 3-star ratings.
- **Domain Specificity:** Performance may vary when applied to domains outside of food and beverages (e.g., electronics or fashion).
- **Sarcasm:** While improved, extremely subtle sarcasm may still lead to errors.
## How to Use
```python
from transformers import pipeline
# Load the model directly from the Hub
model_path = "mlklt3/amazon-sentiment-roberta-base"
sentiment_pipeline = pipeline("sentiment-analysis", model=model_path)
# Example usage
text = "The product was okay, but I expected much better flavor for this price."
result = sentiment_pipeline(text)
print(result)
```
## Citation
If you use this model in your research or project, please credit the Amazon Fine Food Reviews dataset and the Hugging Face Transformers library.