Sentiment_model / README.md
YamenRM's picture
Update README.md
d64d0eb verified
---
language: en
tags:
- sentiment-analysis
- text-classification
- transformers
- distilbert
datasets:
- lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
model-index:
- name: DistilBERT Sentiment Classifier
results:
- task:
type: text-classification
name: Sentiment Analysis
dataset:
name: IMDB Dataset of 50K Movie Reviews
type: text
metrics:
- name: Accuracy
type: accuracy
value: 0.93
- name: F1
type: f1
value: 0.93
- name: Precision
type: precision
value: 0.93
- name: Recall
type: recall
value: 0.93
license: apache-2.0
metrics:
- accuracy
- precision
- recall
---
# DistilBERT Sentiment Classifier
## Model Details
- Model Type: Transformer-based classifier (DistilBERT)
- Base Model: distilbert-base-uncased
- Language: English
- Task: Sentiment Analysis (binary classification)
**Labels:**
0 → Negative
1 → Positive
Framework: Hugging Face Transformers
## Intended Uses & Limitations
#### Intended Use:
Sentiment classification of English reviews, comments, or feedback.
Not Intended Use:
Other languages.
Multi-label sentiment tasks (neutral/mixed).
## ⚠️ Limitations:
- May not generalize well outside movie/review-style data.
- Training data may contain cultural and linguistic bias.
## Training Dataset
- Source: Kaggle Cleaned IMDB Reviews Dataset
- Size: ~50,000 reviews
- Classes: positive, negative
- Converted to integers: positive → 1, negative → 0
## Training Procedure
- Epochs: 3
- Batch Size: 16
- Optimizer: AdamW
- Learning Rate: 5e-5
- Framework: Hugging Face Trainer API
## Evaluation
The model was tested on a held-out validation set of 9,917 reviews.
Class Precision Recall F1-score Support
Negative (0) 0.93 0.93 0.93 4,939
Positive (1) 0.93 0.93 0.93 4,978
## Overall
- Accuracy: 93%
- Macro Avg F1: 0.93
- Weighted Avg F1: 0.93
## How to Use
```
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model_name = "YamenRM/distilbert-sentiment-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)
print(nlp("I really loved this movie, it was amazing!"))
```
```
# [{'label': 'POSITIVE', 'score': 0.98}]
```