DistilBERT for Binary Sentiment Classification

Lightweight sentiment classifier fine-tuned from distilbert-base-uncased to predict sentiment (negative vs. positive) for short English product reviews. Trained on a filtered subset of the Amazon Unlocked Mobile dataset.

Model Details

Base model: distilbert-base-uncased
Task: binary sentiment classification
Labels: 0 -> negative (rating 1), 1 -> positive (rating 5)
Max input length: 128 tokens
Tokenizer: AutoTokenizer for the same checkpoint
Mixed precision: fp16 (when CUDA available)

Intended Use and Limitations

Use for short English product-review style texts.
Binary only (negative/positive). Not suited for nuanced or multi-class sentiment.
Not for safety-critical decisions or content moderation on its own.

Dataset and Preprocessing

Source: Amazon Unlocked Mobile (Amazon_Unlocked_Mobile.csv)
Filtering: keep rows where Rating ∈ {1, 5}; drop unrelated columns
Tokenization: padding to max length, truncation at 128
Split: train/test with test_size = 0.3, seed = 100

Training Configuration

Optimizer and schedule: handled by transformers.Trainer
Learning rate: 2e-5
Batch size: 48 (train/eval per device)
Epochs: 2
Weight decay: 0.01
Save/eval strategy: epoch
Push to Hub: enabled

Evaluation

Computed with accuracy and f1 on the held-out test split. See the repository "Files and versions" / "Training metrics" tabs for run artifacts and exact scores.

How to Use

Python (Transformers pipeline):

from transformers import pipeline

clf = pipeline(
    "text-classification",
    model="Floressek/sentiment_classification_from_distillbert",
    top_k=None  # returns single label with score
)

print(clf("Great handset!"))
print(clf("Shame. I wish I hadn't bought it."))

---

Downloads last month: -

Safetensors

Model size

67M params

Tensor type

F32

Model tree for Floressek/sentiment_classification_from_distillbert

Base model

distilbert/distilbert-base-uncased

Finetuned

(11015)

this model