Model Card for IMDB Movie Review Classifier

This model is built for classifying movie reviews from the IMDB dataset into positive or negative categories. It uses the Hugging Face's DistilBERT model, a lighter version of BERT, for text classification tasks.

Model Details

Model Description

This model is fine-tuned for binary classification, trained on the IMDb dataset, and can predict whether a given review is positive or negative. It utilizes the distilbert-base-uncased model, a pre-trained transformer-based architecture.

Developed by: Amirreza Gholipour
Funded by: Amirreza Gholipour
Model type: Transformer-based Text Classifier
Language(s) (NLP): English
License: MIT
Finetuned from model: distilbert-base-uncased

Uses

Direct Use

This model can be used directly for classifying movie reviews. Given an IMDB review, the model will return a sentiment classification: positive or negative. The input text is tokenized, passed through the DistilBERT model, and the output is processed to classify the review sentiment.

Downstream Use [optional]

The model can be fine-tuned further on other similar datasets to specialize it for different domains, such as classifying product reviews, news sentiment, or other text-based sentiment analysis tasks.

Out-of-Scope Use

This model is not intended for use in detecting sarcasm, irony, or more nuanced sentiment expressions that require deeper contextual understanding. It may not perform well on non-English reviews.

Bias, Risks, and Limitations

The model is trained on the IMDb dataset, which may introduce bias due to the nature of the content of movie reviews. It might not generalize well to domains outside of movie review sentiment classification. The dataset could be biased in terms of the types of movies reviewed (e.g., biased toward Hollywood blockbusters).

Recommendations

Users should ensure they are aware of potential biases in the training data. The model should not be relied on for applications requiring high accuracy in specialized domains or nuanced text understanding.

How to Get Started with the Model

from transformers import pipeline

# Load the pre-trained model
classifier = pipeline('text-classification', model='AmirRghp/distilbert-base-uncasedimdb-text-classification')

# Classify a sample text
text = "The movie was absolutely amazing and I loved every minute of it!"
result = classifier(text)
print(result)

Training Details

Training Data

The model was trained on the IMDb dataset, a collection of 50,000 movie reviews categorized as either positive or negative.

Dataset: IMDb dataset
Number of samples: 50,000
Categories: Positive, Negative
Data Preprocessing: Tokenization and padding were applied to the raw text data to ensure compatibility with the DistilBERT model.

Training Procedure

Preprocessing [optional]

The text data is tokenized using the DistilBERT tokenizer.

Training Hyperparameters

Learning rate: 2e-5
Batch size: 4
Epochs: 5

Speeds, Sizes, Times [optional]

Training was conducted on a GPU with the following specifications:

Hardware Type: Nvidia RTX 5060 TI
Training Time: 5 Min

Evaluation

The model was evaluated on accuracy, F1 score, precision, recall, and confusion matrix metrics. Here are the key evaluation results:

Accuracy: 89.2%
F1 Score: 0.89
Precision: 0.88
Recall: 0.90

Summary

The model performs well on the IMDb dataset, with a high accuracy rate and strong performance across other metrics. It’s ready for use in practical sentiment analysis tasks.

Technical Specifications [optional]

Model Architecture and Objective

The model is based on the DistilBERT architecture, which is a smaller and faster variant of the BERT model, designed to provide similar performance with fewer parameters.

Architecture: Transformer-based encoder-decoder model
Objective: Binary classification of text (positive or negative sentiment)

Compute Infrastructure

Hardware: Nvidia RTX 5060 TI
Libraries: Hugging Face Transformers, PyTorch

More Information [optional]

For more details on the model architecture and training, please refer to the Hugging Face documentation

Model Card Authors [optional]

Author: Amirreza Gholipour
Contact:

Downloads last month: 2

Safetensors

Model size

67M params

Tensor type

F32

Model tree for AmirRghp/distilbert-base-uncasedimdb-text-classification

Base model

distilbert/distilbert-base-uncased

Finetuned

(11589)

this model

AmirRghp
/

distilbert-base-uncasedimdb-text-classification