BERT — IMDB Sentiment Classifier

A fine-tuned bert-base-uncased model that classifies English text as Positive or Negative sentiment. Trained on the IMDB movie reviews dataset.

🎛️ Live demo: huggingface.co/spaces/harshaojha/bert-imdb-demo

Model details

  • Base model: bert-base-uncased (110M parameters)
  • Task: Binary text classification (sentiment)
  • Labels: 0 = Negative, 1 = Positive
  • Language: English
  • Max input length: 256 tokens

Intended use

Educational / demonstration — learning how to fine-tune a transformer and wrap it in a web UI. Works well on movie-review-style text; less reliable on other domains.

Training data

A balanced subset of the IMDB reviews dataset:

  • Train: 10,000 reviews (shuffled, seed=42)
  • Validation: 2,000 reviews (shuffled, seed=42)

Training procedure

Hyperparameter Value
Optimizer AdamW
Learning rate 2e-5
Batch size (train / eval) 16 / 32
Epochs 3
Weight decay 0.01
Warmup ratio 0.1
Max sequence length 256
Best-checkpoint metric accuracy
Hardware NVIDIA T4 GPU (Google Colab)

Evaluation results

Epoch Validation Accuracy
1 89.80%
2 91.45% ← best
3 91.10%

How to use

from transformers import pipeline

clf = pipeline("text-classification", model="harshaojha/bert-imdb-finetuned")
ID2LABEL = {"LABEL_0": "Negative", "LABEL_1": "Positive"}
result = clf("An absolute masterpiece.")[0]
print(ID2LABEL[result["label"]], round(result["score"], 3))
# Positive 0.993

Or load the tokenizer and model directly:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("harshaojha/bert-imdb-finetuned")
model = AutoModelForSequenceClassification.from_pretrained("harshaojha/bert-imdb-finetuned")
inputs = tokenizer("Painfully boring.", return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)[0]
print({"Negative": probs[0].item(), "Positive": probs[1].item()})

Limitations

  • Trained on movie reviews only — accuracy drops on other domains (tweets, product reviews, news).
  • Inherits biases present in IMDB and in the base bert-base-uncased model.
  • Truncates inputs longer than 256 tokens.
  • Trained on 10,000 of the 25,000 available training samples; using the full set would likely push accuracy higher.

Author

Trained by @harshaojha as part of a learning project on transformer fine-tuning.

Downloads last month
34
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for harshaojha/bert-imdb-finetuned

Finetuned
(6739)
this model

Dataset used to train harshaojha/bert-imdb-finetuned

Space using harshaojha/bert-imdb-finetuned 1

Evaluation results