DiyRex's picture
v9 β€” distilbert emotion classifier (8 labels, 0.9356 acc)
f23310b verified
---
language:
- en
license: apache-2.0
tags:
- distilbert
- emotion
- text-classification
- emobooks
base_model: distilbert-base-uncased
pipeline_tag: text-classification
---
# EmoBooks β€” Emotion Classifier
A `distilbert-base-uncased` fine-tuned to classify English (and
Singlish-normalized) user utterances into 8 emotion labels for the
[emoBooks](https://huggingface.co/DiyRex/emobooks-llama3-lora) Sinhala
novel recommender.
## Labels
`sadness`, `joy`, `love`, `anger`, `fear`, `surprise`, `disgust`, `calm`
The runtime additionally maps these to `lonely` and `anxious` via simple
keyword rules (see `emobooks/classifier.py::LABEL_ALIAS`).
## Training
| Parameter | Value |
|---|---|
| Base model | `distilbert-base-uncased` |
| Dataset | 42 k / 2.5 k / 2.5 k (train/val/test) β€” `dataset/training.csv` etc. in the [emobooks repo](https://github.com/) |
| Epochs | 4 |
| Batch size | 32 |
| Max seq len | 160 |
| Learning rate | 2.0e-5 (cosine, 6% warmup) |
| Weight decay | 0.01 |
## Test metrics (held-out 2.5 k split)
| Metric | Value |
|---|---|
| eval_accuracy | **0.9356** |
| eval_loss | 0.2372 |
## Inference
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tok = AutoTokenizer.from_pretrained("DiyRex/emobooks-emotion-classifier")
model = AutoModelForSequenceClassification.from_pretrained(
"DiyRex/emobooks-emotion-classifier"
).eval()
text = "i feel really lonely tonight"
ids = tok(text, return_tensors="pt", truncation=True, max_length=160)
with torch.no_grad():
logits = model(**ids).logits
label = model.config.id2label[int(logits.argmax(-1))]
print(label) # β†’ "sadness" (then mapped to "lonely" by the runtime)
```
## Singlish input
The runtime pre-normalises Singlish/Sinhala affect tokens to English
hints before this model runs (see `emobooks/normalize.py`):
- `mata hari duka` β†’ `i feel sad. mata hari sad` β†’ **sadness**
- `mata satutui` β†’ `i feel happy. mata happy` β†’ **joy**
- `mata loku bayak tiyenne` β†’ fear-cue prepended β†’ **fear**
## Place in the stack
```
user text
↓ normalize (Singlish β†’ English hints)
↓ this classifier (one of 8 emotion labels)
↓ retrieve (xlm-roberta-base mean-pooled, cosine)
↓ filter (emotion β†’ tone/pacing/theme rules)
↓ dialog (state machine)
↓ respond (Llama-3-8B + DiyRex/emobooks-llama3-lora)
↓ guardrail (catalog index check; no fake books)
```
## License
Apache 2.0