Floressek's picture
Update README.md
3fbb5e3 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - text-classification
  - sentiment-analysis
  - distilbert
  - transformers
pipeline_tag: text-classification
library_name: transformers
datasets:
  - Amazon_Unlocked_Mobile
base_model: distilbert-base-uncased
metrics:
  - accuracy
  - f1
  - recall
  - precision
widget:
  - text: Great handset! Works flawlessly.
  - text: Terrible product, waste of money.

DistilBERT for Binary Sentiment Classification

Lightweight sentiment classifier fine-tuned from distilbert-base-uncased to predict sentiment (negative vs. positive) for short English product reviews. Trained on a filtered subset of the Amazon Unlocked Mobile dataset.

Model Details

  • Base model: distilbert-base-uncased
  • Task: binary sentiment classification
  • Labels: 0 -> negative (rating 1), 1 -> positive (rating 5)
  • Max input length: 128 tokens
  • Tokenizer: AutoTokenizer for the same checkpoint
  • Mixed precision: fp16 (when CUDA available)

Intended Use and Limitations

  • Use for short English product-review style texts.
  • Binary only (negative/positive). Not suited for nuanced or multi-class sentiment.
  • Not for safety-critical decisions or content moderation on its own.

Dataset and Preprocessing

  • Source: Amazon Unlocked Mobile (Amazon_Unlocked_Mobile.csv)
  • Filtering: keep rows where Rating ∈ {1, 5}; drop unrelated columns
  • Tokenization: padding to max length, truncation at 128
  • Split: train/test with test_size = 0.3, seed = 100

Training Configuration

  • Optimizer and schedule: handled by transformers.Trainer
  • Learning rate: 2e-5
  • Batch size: 48 (train/eval per device)
  • Epochs: 2
  • Weight decay: 0.01
  • Save/eval strategy: epoch
  • Push to Hub: enabled

Evaluation

Computed with accuracy and f1 on the held-out test split. See the repository "Files and versions" / "Training metrics" tabs for run artifacts and exact scores.

How to Use

Python (Transformers pipeline):

from transformers import pipeline

clf = pipeline(
    "text-classification",
    model="Floressek/sentiment_classification_from_distillbert",
    top_k=None  # returns single label with score
)

print(clf("Great handset!"))
print(clf("Shame. I wish I hadn't bought it."))

---