File size: 2,439 Bytes

84e56bd
 
 
 
 
4410266
 
93e495f
4410266
 
34cebc8
 
84e56bd
34cebc8
84e56bd
697d3ee
 
84e56bd
 
34cebc8
84e56bd
1e7ec08
84e56bd
34cebc8
 
84e56bd
 
 
4410266
 
84e56bd
 
 
4410266
 
 
 
 
 
84e56bd
 
 
4410266
 
 
 
 
 
 
84e56bd
 
 
4410266
 
84e56bd
 
 
 
 
 
 
 
 
 
 
 
 
34cebc8
 
 
 
 
84e56bd
 
 
 
 
 
 
697d3ee

---
library_name: transformers
license: apache-2.0
base_model: distilbert-base-uncased
tags:
- sentiment analysis
- text-classification
- distilbert
- imdb
- transformers
metrics:
- accuracy
model-index:
- name: an-imdb-classifier
  results: []
datasets:
- stanfordnlp/imdb
---

# an-imdb-classifier

This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the stanfordnlp.imdb dataset.
It achieves the following results on the evaluation set:
- Loss: 0.3635
- Accuracy: 0.898

## Model description

This model is a fine-tuned version of the distilbert-base-uncased model, trained for sentiment analysis on a subset of the IMDb dataset.
It is designed to classify movie reviews as either positive or negative.

## Intended uses & limitations

This model is intended for use in classifying the sentiment of movie reviews. 

It can be used for tasks such as:
Automatically categorizing movie reviews on websites or platforms.
Analyzing the overall sentiment towards a particular movie.
Providing feedback to users based on their review sentiment.

## Training and evaluation data

The model was fine-tuned on a small subset of the IMDb dataset.

Training set size: 5000 examples
Evaluation set size: 500 examples

The dataset contains movie reviews labeled as either positive (label 1) or negative (label 0).
The distribution of labels in the training set is approximately equal (2494 negative, 2506 positive).

## Training procedure

The model was trained using the Hugging Face Trainer on the tokenized IMDb dataset subset, using the preprocess_function to tokenize the text and truncate it.

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:--------:|
| No log        | 1.0   | 313  | 0.3199          | 0.866    |
| 0.2966        | 2.0   | 626  | 0.3023          | 0.89     |
| 0.2966        | 3.0   | 939  | 0.3635          | 0.898    |


### Framework versions

- Transformers 4.55.0
- Pytorch 2.6.0+cu124
- Datasets 4.0.0
- Tokenizers 0.21.4