Floressek's picture
Update README.md
3fbb5e3 verified
---
language:
- en
license: apache-2.0
tags:
- text-classification
- sentiment-analysis
- distilbert
- transformers
pipeline_tag: text-classification
library_name: transformers
datasets:
- Amazon_Unlocked_Mobile
base_model: distilbert-base-uncased
metrics:
- accuracy
- f1
- recall
- precision
widget:
- text: "Great handset! Works flawlessly."
- text: "Terrible product, waste of money."
---
# DistilBERT for Binary Sentiment Classification
Lightweight sentiment classifier fine-tuned from `distilbert-base-uncased` to predict sentiment (negative vs. positive) for short English product reviews. Trained on a filtered subset of the Amazon Unlocked Mobile dataset.
## Model Details
- Base model: `distilbert-base-uncased`
- Task: binary sentiment classification
- Labels: `0 -> negative` (rating 1), `1 -> positive` (rating 5)
- Max input length: 128 tokens
- Tokenizer: `AutoTokenizer` for the same checkpoint
- Mixed precision: fp16 (when CUDA available)
## Intended Use and Limitations
- Use for short English product-review style texts.
- Binary only (negative/positive). Not suited for nuanced or multi-class sentiment.
- Not for safety-critical decisions or content moderation on its own.
## Dataset and Preprocessing
- Source: Amazon Unlocked Mobile (`Amazon_Unlocked_Mobile.csv`)
- Filtering: keep rows where `Rating ∈ {1, 5}`; drop unrelated columns
- Tokenization: padding to max length, truncation at 128
- Split: train/test with `test_size = 0.3`, `seed = 100`
## Training Configuration
- Optimizer and schedule: handled by `transformers.Trainer`
- Learning rate: `2e-5`
- Batch size: `48` (train/eval per device)
- Epochs: `2`
- Weight decay: `0.01`
- Save/eval strategy: `epoch`
- Push to Hub: enabled
## Evaluation
Computed with `accuracy` and `f1` on the held-out test split. See the repository "Files and versions" / "Training metrics" tabs for run artifacts and exact scores.
## How to Use
Python (Transformers pipeline):
```python
from transformers import pipeline
clf = pipeline(
"text-classification",
model="Floressek/sentiment_classification_from_distillbert",
top_k=None # returns single label with score
)
print(clf("Great handset!"))
print(clf("Shame. I wish I hadn't bought it."))
---