--- language: - en license: apache-2.0 tags: - distilbert - emotion - text-classification - emobooks base_model: distilbert-base-uncased pipeline_tag: text-classification --- # EmoBooks — Emotion Classifier A `distilbert-base-uncased` fine-tuned to classify English (and Singlish-normalized) user utterances into 8 emotion labels for the [emoBooks](https://huggingface.co/DiyRex/emobooks-llama3-lora) Sinhala novel recommender. ## Labels `sadness`, `joy`, `love`, `anger`, `fear`, `surprise`, `disgust`, `calm` The runtime additionally maps these to `lonely` and `anxious` via simple keyword rules (see `emobooks/classifier.py::LABEL_ALIAS`). ## Training | Parameter | Value | |---|---| | Base model | `distilbert-base-uncased` | | Dataset | 42 k / 2.5 k / 2.5 k (train/val/test) — `dataset/training.csv` etc. in the [emobooks repo](https://github.com/) | | Epochs | 4 | | Batch size | 32 | | Max seq len | 160 | | Learning rate | 2.0e-5 (cosine, 6% warmup) | | Weight decay | 0.01 | ## Test metrics (held-out 2.5 k split) | Metric | Value | |---|---| | eval_accuracy | **0.9356** | | eval_loss | 0.2372 | ## Inference ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tok = AutoTokenizer.from_pretrained("DiyRex/emobooks-emotion-classifier") model = AutoModelForSequenceClassification.from_pretrained( "DiyRex/emobooks-emotion-classifier" ).eval() text = "i feel really lonely tonight" ids = tok(text, return_tensors="pt", truncation=True, max_length=160) with torch.no_grad(): logits = model(**ids).logits label = model.config.id2label[int(logits.argmax(-1))] print(label) # → "sadness" (then mapped to "lonely" by the runtime) ``` ## Singlish input The runtime pre-normalises Singlish/Sinhala affect tokens to English hints before this model runs (see `emobooks/normalize.py`): - `mata hari duka` → `i feel sad. mata hari sad` → **sadness** - `mata satutui` → `i feel happy. mata happy` → **joy** - `mata loku bayak tiyenne` → fear-cue prepended → **fear** ## Place in the stack ``` user text ↓ normalize (Singlish → English hints) ↓ this classifier (one of 8 emotion labels) ↓ retrieve (xlm-roberta-base mean-pooled, cosine) ↓ filter (emotion → tone/pacing/theme rules) ↓ dialog (state machine) ↓ respond (Llama-3-8B + DiyRex/emobooks-llama3-lora) ↓ guardrail (catalog index check; no fake books) ``` ## License Apache 2.0