DistilBERT for Binary Sentiment Classification
Lightweight sentiment classifier fine-tuned from distilbert-base-uncased to predict sentiment (negative vs. positive) for short English product reviews. Trained on a filtered subset of the Amazon Unlocked Mobile dataset.
Model Details
- Base model:
distilbert-base-uncased - Task: binary sentiment classification
- Labels:
0 -> negative(rating 1),1 -> positive(rating 5) - Max input length: 128 tokens
- Tokenizer:
AutoTokenizerfor the same checkpoint - Mixed precision: fp16 (when CUDA available)
Intended Use and Limitations
- Use for short English product-review style texts.
- Binary only (negative/positive). Not suited for nuanced or multi-class sentiment.
- Not for safety-critical decisions or content moderation on its own.
Dataset and Preprocessing
- Source: Amazon Unlocked Mobile (
Amazon_Unlocked_Mobile.csv) - Filtering: keep rows where
Rating ∈ {1, 5}; drop unrelated columns - Tokenization: padding to max length, truncation at 128
- Split: train/test with
test_size = 0.3,seed = 100
Training Configuration
- Optimizer and schedule: handled by
transformers.Trainer - Learning rate:
2e-5 - Batch size:
48(train/eval per device) - Epochs:
2 - Weight decay:
0.01 - Save/eval strategy:
epoch - Push to Hub: enabled
Evaluation
Computed with accuracy and f1 on the held-out test split. See the repository "Files and versions" / "Training metrics" tabs for run artifacts and exact scores.
How to Use
Python (Transformers pipeline):
from transformers import pipeline
clf = pipeline(
"text-classification",
model="Floressek/sentiment_classification_from_distillbert",
top_k=None # returns single label with score
)
print(clf("Great handset!"))
print(clf("Shame. I wish I hadn't bought it."))
---
- Downloads last month
- -
Model tree for Floressek/sentiment_classification_from_distillbert
Base model
distilbert/distilbert-base-uncased