EmoBooks β Emotion Classifier
A distilbert-base-uncased fine-tuned to classify English (and
Singlish-normalized) user utterances into 8 emotion labels for the
emoBooks Sinhala
novel recommender.
Labels
sadness, joy, love, anger, fear, surprise, disgust, calm
The runtime additionally maps these to lonely and anxious via simple
keyword rules (see emobooks/classifier.py::LABEL_ALIAS).
Training
| Parameter | Value |
|---|---|
| Base model | distilbert-base-uncased |
| Dataset | 42 k / 2.5 k / 2.5 k (train/val/test) β dataset/training.csv etc. in the emobooks repo |
| Epochs | 4 |
| Batch size | 32 |
| Max seq len | 160 |
| Learning rate | 2.0e-5 (cosine, 6% warmup) |
| Weight decay | 0.01 |
Test metrics (held-out 2.5 k split)
| Metric | Value |
|---|---|
| eval_accuracy | 0.9356 |
| eval_loss | 0.2372 |
Inference
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tok = AutoTokenizer.from_pretrained("DiyRex/emobooks-emotion-classifier")
model = AutoModelForSequenceClassification.from_pretrained(
"DiyRex/emobooks-emotion-classifier"
).eval()
text = "i feel really lonely tonight"
ids = tok(text, return_tensors="pt", truncation=True, max_length=160)
with torch.no_grad():
logits = model(**ids).logits
label = model.config.id2label[int(logits.argmax(-1))]
print(label) # β "sadness" (then mapped to "lonely" by the runtime)
Singlish input
The runtime pre-normalises Singlish/Sinhala affect tokens to English
hints before this model runs (see emobooks/normalize.py):
mata hari dukaβi feel sad. mata hari sadβ sadnessmata satutuiβi feel happy. mata happyβ joymata loku bayak tiyenneβ fear-cue prepended β fear
Place in the stack
user text
β normalize (Singlish β English hints)
β this classifier (one of 8 emotion labels)
β retrieve (xlm-roberta-base mean-pooled, cosine)
β filter (emotion β tone/pacing/theme rules)
β dialog (state machine)
β respond (Llama-3-8B + DiyRex/emobooks-llama3-lora)
β guardrail (catalog index check; no fake books)
License
Apache 2.0
- Downloads last month
- 15
Model tree for DiyRex/emobooks-emotion-classifier
Base model
distilbert/distilbert-base-uncased