DistilBERT IMDb DAPT Model

This model is a DistilBERT encoder that has been further pretrained using Domain-Adaptive Pretraining (DAPT) on 50k IMDb movie reviews in the unsupervised split of the IMDb movie reviews dataset. The goal of this additional pretraining is to adapt the language representations of DistilBERT to the movie review domain, improving performance on downstream sentiment-related tasks.

The model can be used as:

A feature extractor for sentiment analysis
A starting point for fine-tuning on IMDb or similar review datasets
A domain-adapted encoder for other NLP tasks involving movie reviews or opinionated text

A classifier based on this model for movie review sentiment analysis is here

Training Details

Base model: DistilBERT Base Uncased
Pretraining objective: Masked Language Modeling (MLM)

Domain-Adaptive Pretraining (DAPT)

Dataset: stanforfnlp/imdb
Language: English
Tokenization: WordPiece
Dynamic padding used during training
Gradient accumulation used to simulate larger batch size

This DAPT step allows the model to better capture:

Movie-related vocabulary
Review-style language
Sentiment-heavy expressions

Intended Uses

Supported Use Cases

Sentiment analysis on movie reviews
Fine-tuning for downstream classification tasks
Feature extraction for NLP research
Educational and research purposes

Out-of-Scope Uses

Medical, legal or financial decision-making
Safety-critical or high-risk applications
Tasks requiring factual correctness or reasoning beyond sentiment understanding

Bias, Risks, and Limitations

Inherits biases present in the IMDb dataset
Performance depends heavily on downstream fine-tuning
Not designed for multilingual or non-review text
Not suitable for reasoning-heavy tasks

Ethical Considerations

IMDb reviews may reflect demographic, cultural or opinion biases
Outputs should not be used to draw conclusions about individuals or groups
This model should not be used for moderation or harmful profiling

Training Hyperparameters

Learning rate: 3e-5
Optimizer: AdamW
Epochs: 3
Batch size: 16 with 4 gradient accumulation steps (Effective batch size 64)
max_seq_length: 512
Linear warmup: 5%
Automatic mixed precision training

Training stats

Epoch 1: MLM Loss: 2.3581
Epoch 2: MLM Loss: 2.2295
Epoch 3: MLM Loss: 2.1622

License

This model follows the license of the base DistilBERT model. Please refer to the Hugging Face repository for license details.

More information

Downloads last month: 2

Safetensors

Model size

67M params

Tensor type

F32

Model tree for ayushshah/distilbert-base-uncased-imdb-dapt

Base model

distilbert/distilbert-base-uncased

Finetuned

(11764)

this model

Finetunes

1 model

Dataset used to train ayushshah/distilbert-base-uncased-imdb-dapt

Space using ayushshah/distilbert-base-uncased-imdb-dapt 1

Paper for ayushshah/distilbert-base-uncased-imdb-dapt

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 23