| | --- |
| | language: |
| | - en |
| | license: apache-2.0 |
| | tags: |
| | - text-classification |
| | - sentiment-analysis |
| | - distilbert |
| | - transformers |
| | pipeline_tag: text-classification |
| | library_name: transformers |
| | datasets: |
| | - Amazon_Unlocked_Mobile |
| | base_model: distilbert-base-uncased |
| | metrics: |
| | - accuracy |
| | - f1 |
| | - recall |
| | - precision |
| | widget: |
| | - text: "Great handset! Works flawlessly." |
| | - text: "Terrible product, waste of money." |
| | --- |
| | |
| | # DistilBERT for Binary Sentiment Classification |
| |
|
| | Lightweight sentiment classifier fine-tuned from `distilbert-base-uncased` to predict sentiment (negative vs. positive) for short English product reviews. Trained on a filtered subset of the Amazon Unlocked Mobile dataset. |
| |
|
| | ## Model Details |
| | - Base model: `distilbert-base-uncased` |
| | - Task: binary sentiment classification |
| | - Labels: `0 -> negative` (rating 1), `1 -> positive` (rating 5) |
| | - Max input length: 128 tokens |
| | - Tokenizer: `AutoTokenizer` for the same checkpoint |
| | - Mixed precision: fp16 (when CUDA available) |
| |
|
| | ## Intended Use and Limitations |
| | - Use for short English product-review style texts. |
| | - Binary only (negative/positive). Not suited for nuanced or multi-class sentiment. |
| | - Not for safety-critical decisions or content moderation on its own. |
| |
|
| | ## Dataset and Preprocessing |
| | - Source: Amazon Unlocked Mobile (`Amazon_Unlocked_Mobile.csv`) |
| | - Filtering: keep rows where `Rating ∈ {1, 5}`; drop unrelated columns |
| | - Tokenization: padding to max length, truncation at 128 |
| | - Split: train/test with `test_size = 0.3`, `seed = 100` |
| |
|
| | ## Training Configuration |
| | - Optimizer and schedule: handled by `transformers.Trainer` |
| | - Learning rate: `2e-5` |
| | - Batch size: `48` (train/eval per device) |
| | - Epochs: `2` |
| | - Weight decay: `0.01` |
| | - Save/eval strategy: `epoch` |
| | - Push to Hub: enabled |
| |
|
| | ## Evaluation |
| | Computed with `accuracy` and `f1` on the held-out test split. See the repository "Files and versions" / "Training metrics" tabs for run artifacts and exact scores. |
| |
|
| | ## How to Use |
| |
|
| | Python (Transformers pipeline): |
| | ```python |
| | from transformers import pipeline |
| | |
| | clf = pipeline( |
| | "text-classification", |
| | model="Floressek/sentiment_classification_from_distillbert", |
| | top_k=None # returns single label with score |
| | ) |
| | |
| | print(clf("Great handset!")) |
| | print(clf("Shame. I wish I hadn't bought it.")) |
| | |
| | --- |