Model Card for IMDB Movie Review Classifier
This model is built for classifying movie reviews from the IMDB dataset into positive or negative categories. It uses the Hugging Face's DistilBERT model, a lighter version of BERT, for text classification tasks.
Model Details
Model Description
This model is fine-tuned for binary classification, trained on the IMDb dataset, and can predict whether a given review is positive or negative. It utilizes the distilbert-base-uncased model, a pre-trained transformer-based architecture.
- Developed by: Amirreza Gholipour
- Funded by: Amirreza Gholipour
- Model type: Transformer-based Text Classifier
- Language(s) (NLP): English
- License: MIT
- Finetuned from model: distilbert-base-uncased
Uses
Direct Use
This model can be used directly for classifying movie reviews. Given an IMDB review, the model will return a sentiment classification: positive or negative. The input text is tokenized, passed through the DistilBERT model, and the output is processed to classify the review sentiment.
Downstream Use [optional]
The model can be fine-tuned further on other similar datasets to specialize it for different domains, such as classifying product reviews, news sentiment, or other text-based sentiment analysis tasks.
Out-of-Scope Use
This model is not intended for use in detecting sarcasm, irony, or more nuanced sentiment expressions that require deeper contextual understanding. It may not perform well on non-English reviews.
Bias, Risks, and Limitations
The model is trained on the IMDb dataset, which may introduce bias due to the nature of the content of movie reviews. It might not generalize well to domains outside of movie review sentiment classification. The dataset could be biased in terms of the types of movies reviewed (e.g., biased toward Hollywood blockbusters).
Recommendations
Users should ensure they are aware of potential biases in the training data. The model should not be relied on for applications requiring high accuracy in specialized domains or nuanced text understanding.
How to Get Started with the Model
from transformers import pipeline
# Load the pre-trained model
classifier = pipeline('text-classification', model='AmirRghp/distilbert-base-uncasedimdb-text-classification')
# Classify a sample text
text = "The movie was absolutely amazing and I loved every minute of it!"
result = classifier(text)
print(result)
Training Details
Training Data
The model was trained on the IMDb dataset, a collection of 50,000 movie reviews categorized as either positive or negative.
- Dataset: IMDb dataset
- Number of samples: 50,000
- Categories: Positive, Negative
- Data Preprocessing: Tokenization and padding were applied to the raw text data to ensure compatibility with the DistilBERT model.
Training Procedure
Preprocessing [optional]
The text data is tokenized using the DistilBERT tokenizer.
Training Hyperparameters
- Learning rate: 2e-5
- Batch size: 4
- Epochs: 5
Speeds, Sizes, Times [optional]
Training was conducted on a GPU with the following specifications:
- Hardware Type: Nvidia RTX 5060 TI
- Training Time: 5 Min
Evaluation
The model was evaluated on accuracy, F1 score, precision, recall, and confusion matrix metrics. Here are the key evaluation results:
- Accuracy: 89.2%
- F1 Score: 0.89
- Precision: 0.88
- Recall: 0.90
Summary
The model performs well on the IMDb dataset, with a high accuracy rate and strong performance across other metrics. It’s ready for use in practical sentiment analysis tasks.
Technical Specifications [optional]
Model Architecture and Objective
The model is based on the DistilBERT architecture, which is a smaller and faster variant of the BERT model, designed to provide similar performance with fewer parameters.
- Architecture: Transformer-based encoder-decoder model
- Objective: Binary classification of text (positive or negative sentiment)
Compute Infrastructure
- Hardware: Nvidia RTX 5060 TI
- Libraries: Hugging Face Transformers, PyTorch
More Information [optional]
For more details on the model architecture and training, please refer to the Hugging Face documentation
Model Card Authors [optional]
- Downloads last month
- 14
Model tree for AmirRghp/distilbert-base-uncasedimdb-text-classification
Base model
distilbert/distilbert-base-uncased