Text Classification
Transformers
Safetensors
English
distilbert
emotion
sentiment
Eval Results (legacy)
text-embeddings-inference
Instructions to use LaelaZ/distilbert-emotion with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LaelaZ/distilbert-emotion with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="LaelaZ/distilbert-emotion")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("LaelaZ/distilbert-emotion") model = AutoModelForSequenceClassification.from_pretrained("LaelaZ/distilbert-emotion") - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| language: | |
| - en | |
| library_name: transformers | |
| pipeline_tag: text-classification | |
| base_model: distilbert-base-uncased | |
| datasets: | |
| - dair-ai/emotion | |
| tags: | |
| - emotion | |
| - text-classification | |
| - distilbert | |
| - sentiment | |
| metrics: | |
| - accuracy | |
| - f1 | |
| widget: | |
| - text: "i can't stop smiling, today went better than i ever hoped" | |
| - text: "my hands are shaking, i really don't think i can walk in there" | |
| - text: "how dare they take credit for the work i did all weekend" | |
| model-index: | |
| - name: distilbert-emotion | |
| results: | |
| - task: | |
| type: text-classification | |
| name: Emotion Classification | |
| dataset: | |
| type: dair-ai/emotion | |
| name: emotion | |
| config: split | |
| split: test | |
| metrics: | |
| - type: accuracy | |
| value: 0.920 | |
| name: Accuracy | |
| - type: f1 | |
| value: 0.874 | |
| name: Macro F1 | |
| # distilbert-emotion | |
| `distilbert-base-uncased` fine-tuned on the [emotion](https://huggingface.co/datasets/dair-ai/emotion) | |
| dataset to classify a short English sentence into one of six emotions: | |
| **sadness, joy, love, anger, fear, surprise**. | |
| Built by [Laela Zorana](https://github.com/LaelaZorana). Code, tests, and a live demo: | |
| - GitHub: https://github.com/LaelaZorana/distilbert-emotion | |
| - Demo Space: https://huggingface.co/spaces/LaelaZ/distilbert-emotion | |
| ## Usage | |
| ```python | |
| from transformers import pipeline | |
| clf = pipeline("text-classification", model="LaelaZ/distilbert-emotion", top_k=None) | |
| clf("i can't stop smiling, today went better than i ever hoped") | |
| # -> [{'label': 'joy', 'score': 0.99}, ...] | |
| ``` | |
| ## Evaluation | |
| Evaluated on the held-out `test` split (2,000 examples the model never trained on). Macro F1 | |
| is reported alongside accuracy because the classes are imbalanced (joy and sadness dominate, | |
| surprise is rare), so accuracy alone would overstate performance on the rare classes. | |
| <!-- METRICS:START --> | |
| | metric | score | | |
| |---|---| | |
| | accuracy | 0.920 | | |
| | macro F1 | 0.874 | | |
| | weighted F1 | 0.920 | | |
| Per-class F1: sadness 0.96, joy 0.94, anger 0.92, fear 0.90, love 0.81, surprise 0.72. The two | |
| weakest classes are the two rarest (love n=159, surprise n=66), which is why macro F1 (0.874) | |
| sits below accuracy (0.920): macro F1 weights every class equally and exposes the rare-class | |
| weakness that accuracy hides. | |
| <!-- METRICS:END --> | |
| The repository also surfaces the model's **confidently wrong** predictions (the loudest | |
| mistakes), which is where the model's real limits show. | |
| ## Error analysis | |
| A real confusion matrix and per-class breakdown on the **full held-out test set (2,000 | |
| examples)**, regenerated from the trained weights with `python -m emotion.error_report`. | |
|  | |
| <details><summary>Confusion matrix as counts (rows = true, cols = predicted)</summary> | |
| | true ↓ / pred → | sadness | joy | love | anger | fear | surprise | recall | | |
| |---|---|---|---|---|---|---|---| | |
| | sadness | 558 | 10 | 2 | 4 | 7 | 0 | 0.96 | | |
| | joy | 6 | 656 | 28 | 3 | 1 | 1 | 0.94 | | |
| | love | 0 | 28 | 128 | 3 | 0 | 0 | 0.81 | | |
| | anger | 13 | 4 | 0 | 246 | 12 | 0 | 0.89 | | |
| | fear | 3 | 0 | 0 | 2 | 208 | 11 | 0.93 | | |
| | surprise | 3 | 7 | 0 | 0 | 12 | 44 | 0.67 | | |
| </details> | |
| **Per-class precision / recall / F1** | |
| | class | precision | recall | F1 | support | | |
| |---|---|---|---|---| | |
| | sadness | 0.957 | 0.960 | 0.959 | 581 | | |
| | joy | 0.930 | 0.944 | 0.937 | 695 | | |
| | love | 0.810 | 0.805 | 0.808 | 159 | | |
| | anger | 0.953 | 0.895 | 0.923 | 275 | | |
| | fear | 0.867 | 0.929 | 0.897 | 224 | | |
| | surprise | 0.786 | 0.667 | 0.721 | 66 | | |
| **Where it fails.** The single largest error axis is **joy ↔ love** (28 + 28 mutual | |
| misclassifications): both are short, affect-positive messages, so the model leans toward the | |
| higher-frequency neighbour. The rarest class, `surprise` (n=66), leaks mainly into `fear` (12) | |
| and `joy` (7). The mistakes are semantically adjacent rather than random. The model learned the | |
| manifold and is mostly losing the low-support classes, not misfiring broadly. | |
| **Confidently wrong (highest-confidence mistakes):** the cases the model got wrong *and* was | |
| sure about, the slice worth reading: | |
| | true | predicted | conf | text | | |
| |---|---|---|---| | |
| | joy | sadness | 0.99 | i feel very saddened that the king whom i once quite respected as far as monarchs go was i… | | |
| | love | joy | 0.99 | i feel affirmed gracious sensuous and will have less self doubt when a href http generatio… | | |
| | sadness | joy | 0.99 | i first started reading city of dark magic i thought it would be a challenge to actually e… | | |
| | anger | sadness | 0.98 | i actually was in a meeting last week where someone yelled at an older lady because her ph… | | |
| | sadness | joy | 0.98 | i felt a stronger wish to be free from self cherishing through my refuge practice and a re… | | |
| | anger | sadness | 0.98 | i really dont like quinn because i feel like she will just end up hurting barney and i hat… | | |
| ## Training | |
| - Base model: `distilbert-base-uncased` | |
| - Dataset: `dair-ai/emotion` (split config), 5,000-example training subset | |
| - Objective: cross-entropy over 6 classes | |
| - Optimizer: AdamW, lr 2e-5, linear warmup (10%), gradient clipping at 1.0 | |
| - Max sequence length: 64, batch size 16, 3 epochs, CPU | |
| ## Limitations | |
| The emotion dataset is short, informal English (tweet-style). The model can be confidently | |
| wrong on sarcasm, mixed feelings, or text unlike the training distribution. It predicts | |
| exactly one of six emotions and has no "neutral" or "other" class. | |
| ## License | |
| MIT. | |