| | --- |
| | language: en |
| | tags: |
| | - sentiment-analysis |
| | - text-classification |
| | - transformers |
| | - distilbert |
| | datasets: |
| | - lakshmi25npathi/imdb-dataset-of-50k-movie-reviews |
| | model-index: |
| | - name: DistilBERT Sentiment Classifier |
| | results: |
| | - task: |
| | type: text-classification |
| | name: Sentiment Analysis |
| | dataset: |
| | name: IMDB Dataset of 50K Movie Reviews |
| | type: text |
| | metrics: |
| | - name: Accuracy |
| | type: accuracy |
| | value: 0.93 |
| | - name: F1 |
| | type: f1 |
| | value: 0.93 |
| | - name: Precision |
| | type: precision |
| | value: 0.93 |
| | - name: Recall |
| | type: recall |
| | value: 0.93 |
| | license: apache-2.0 |
| | metrics: |
| | - accuracy |
| | - precision |
| | - recall |
| | --- |
| | |
| |
|
| | # DistilBERT Sentiment Classifier |
| | ## Model Details |
| |
|
| | - Model Type: Transformer-based classifier (DistilBERT) |
| |
|
| | - Base Model: distilbert-base-uncased |
| |
|
| | - Language: English |
| |
|
| | - Task: Sentiment Analysis (binary classification) |
| |
|
| | **Labels:** |
| |
|
| | 0 → Negative |
| |
|
| | 1 → Positive |
| |
|
| | Framework: Hugging Face Transformers |
| |
|
| | ## Intended Uses & Limitations |
| |
|
| | #### Intended Use: |
| |
|
| | Sentiment classification of English reviews, comments, or feedback. |
| |
|
| | Not Intended Use: |
| |
|
| | Other languages. |
| |
|
| | Multi-label sentiment tasks (neutral/mixed). |
| |
|
| | ## ⚠️ Limitations: |
| |
|
| | - May not generalize well outside movie/review-style data. |
| |
|
| | - Training data may contain cultural and linguistic bias. |
| |
|
| | ## Training Dataset |
| |
|
| | - Source: Kaggle Cleaned IMDB Reviews Dataset |
| |
|
| | - Size: ~50,000 reviews |
| |
|
| | - Classes: positive, negative |
| |
|
| | - Converted to integers: positive → 1, negative → 0 |
| |
|
| | ## Training Procedure |
| |
|
| | - Epochs: 3 |
| |
|
| | - Batch Size: 16 |
| |
|
| | - Optimizer: AdamW |
| |
|
| | - Learning Rate: 5e-5 |
| |
|
| | - Framework: Hugging Face Trainer API |
| |
|
| | ## Evaluation |
| |
|
| | The model was tested on a held-out validation set of 9,917 reviews. |
| |
|
| | Class Precision Recall F1-score Support |
| | Negative (0) 0.93 0.93 0.93 4,939 |
| | Positive (1) 0.93 0.93 0.93 4,978 |
| |
|
| | ## Overall |
| |
|
| | - Accuracy: 93% |
| |
|
| | - Macro Avg F1: 0.93 |
| |
|
| | - Weighted Avg F1: 0.93 |
| |
|
| |
|
| | ## How to Use |
| | ``` |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
| | |
| | model_name = "YamenRM/distilbert-sentiment-classifier" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | model = AutoModelForSequenceClassification.from_pretrained(model_name) |
| | |
| | nlp = pipeline("text-classification", model=model, tokenizer=tokenizer) |
| | |
| | print(nlp("I really loved this movie, it was amazing!")) |
| | ``` |
| | ``` |
| | # [{'label': 'POSITIVE', 'score': 0.98}] |
| | ``` |