EmoBooks β€” Emotion Classifier

A distilbert-base-uncased fine-tuned to classify English (and Singlish-normalized) user utterances into 8 emotion labels for the emoBooks Sinhala novel recommender.

Labels

sadness, joy, love, anger, fear, surprise, disgust, calm

The runtime additionally maps these to lonely and anxious via simple keyword rules (see emobooks/classifier.py::LABEL_ALIAS).

Training

Parameter Value
Base model distilbert-base-uncased
Dataset 42 k / 2.5 k / 2.5 k (train/val/test) β€” dataset/training.csv etc. in the emobooks repo
Epochs 4
Batch size 32
Max seq len 160
Learning rate 2.0e-5 (cosine, 6% warmup)
Weight decay 0.01

Test metrics (held-out 2.5 k split)

Metric Value
eval_accuracy 0.9356
eval_loss 0.2372

Inference

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("DiyRex/emobooks-emotion-classifier")
model = AutoModelForSequenceClassification.from_pretrained(
    "DiyRex/emobooks-emotion-classifier"
).eval()

text = "i feel really lonely tonight"
ids = tok(text, return_tensors="pt", truncation=True, max_length=160)
with torch.no_grad():
    logits = model(**ids).logits
label = model.config.id2label[int(logits.argmax(-1))]
print(label)  # β†’ "sadness"  (then mapped to "lonely" by the runtime)

Singlish input

The runtime pre-normalises Singlish/Sinhala affect tokens to English hints before this model runs (see emobooks/normalize.py):

  • mata hari duka β†’ i feel sad. mata hari sad β†’ sadness
  • mata satutui β†’ i feel happy. mata happy β†’ joy
  • mata loku bayak tiyenne β†’ fear-cue prepended β†’ fear

Place in the stack

user text
   ↓ normalize       (Singlish β†’ English hints)
   ↓ this classifier (one of 8 emotion labels)
   ↓ retrieve        (xlm-roberta-base mean-pooled, cosine)
   ↓ filter          (emotion β†’ tone/pacing/theme rules)
   ↓ dialog          (state machine)
   ↓ respond         (Llama-3-8B + DiyRex/emobooks-llama3-lora)
   ↓ guardrail       (catalog index check; no fake books)

License

Apache 2.0

Downloads last month
15
Safetensors
Model size
67M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DiyRex/emobooks-emotion-classifier

Finetuned
(11483)
this model