Update README.md

3fbb5e3 verified 4 months ago

2.24 kB

language:
  - en
license: apache-2.0
tags:
  - text-classification
  - sentiment-analysis
  - distilbert
  - transformers
pipeline_tag: text-classification
library_name: transformers
datasets:
  - Amazon_Unlocked_Mobile
base_model: distilbert-base-uncased
metrics:
  - accuracy
  - f1
  - recall
  - precision
widget:
  - text: Great handset! Works flawlessly.
  - text: Terrible product, waste of money.

DistilBERT for Binary Sentiment Classification

Lightweight sentiment classifier fine-tuned from distilbert-base-uncased to predict sentiment (negative vs. positive) for short English product reviews. Trained on a filtered subset of the Amazon Unlocked Mobile dataset.

Model Details

Base model: distilbert-base-uncased
Task: binary sentiment classification
Labels: 0 -> negative (rating 1), 1 -> positive (rating 5)
Max input length: 128 tokens
Tokenizer: AutoTokenizer for the same checkpoint
Mixed precision: fp16 (when CUDA available)

Intended Use and Limitations

Use for short English product-review style texts.
Binary only (negative/positive). Not suited for nuanced or multi-class sentiment.
Not for safety-critical decisions or content moderation on its own.

Dataset and Preprocessing

Source: Amazon Unlocked Mobile (Amazon_Unlocked_Mobile.csv)
Filtering: keep rows where Rating ∈ {1, 5}; drop unrelated columns
Tokenization: padding to max length, truncation at 128
Split: train/test with test_size = 0.3, seed = 100

Training Configuration

Optimizer and schedule: handled by transformers.Trainer
Learning rate: 2e-5
Batch size: 48 (train/eval per device)
Epochs: 2
Weight decay: 0.01
Save/eval strategy: epoch
Push to Hub: enabled

Evaluation

Computed with accuracy and f1 on the held-out test split. See the repository "Files and versions" / "Training metrics" tabs for run artifacts and exact scores.

How to Use

Python (Transformers pipeline):

from transformers import pipeline

clf = pipeline(
    "text-classification",
    model="Floressek/sentiment_classification_from_distillbert",
    top_k=None  # returns single label with score
)

print(clf("Great handset!"))
print(clf("Shame. I wish I hadn't bought it."))

---