GoEmotions Chatbot Emotion Classifier

This repository contains a production export of a DeBERTa-v3-large emotion classifier trained with:

  • Head A: GoEmotions, 28 labels, used for production inference.
  • Head B: MELD, 7 labels, used as an auxiliary dialogue-context training task.

Only Head A is included in this export.

Intended Use

The model predicts user emotional signals for chatbot applications. Production inference returns the top 3 emotions with probabilities and prompt-ready wording for an LLM.

Production behavior merges three unreliable low-support GoEmotions classes:

  • grief -> sadness
  • pride -> admiration
  • relief -> joy

This gives a 25-label production label space while preserving the original raw 28-way logits for inspection.

Metrics

Held-out GoEmotions test set:

  • Weighted F1: 0.5285
  • Accuracy: 0.5338
  • Macro F1: 0.4143
  • ECE@15: 0.0595

Chatbot-domain validation set, 249 labelled examples, production merged 25-label space:

  • Weighted F1: 0.4568
  • Accuracy: 0.4739
  • Macro F1: 0.4586
  • ECE@15: 0.1757

The production-domain score passes the chosen deployment gate of weighted F1 >= 0.43.

Files

  • encoder/: DeBERTa-v3-large encoder weights and config.
  • tokenizer/: tokenizer files.
  • head_a.pt: production GoEmotions classification head.
  • metadata.json: training metadata, label list, and validation metadata.
  • inference.py: standalone local inference helper.

Local Inference

After downloading the repo:

python inference.py --model-dir . --text "I tried everything and I'm worried I broke the account."

For prompt-ready LLM formatting:

python inference.py --model-dir . --text "I tried everything and I'm worried I broke the account." --output-format llm

Expected output shape:

User emotional signals (top 3):
- Primary: fear (confidence: 0.XX)
- Secondary: confusion (confidence: 0.XX)
- Tertiary: sadness (confidence: 0.XX)
Interpretation: User likely feels fear.

Limitations

  • This is a custom PyTorch export, not a drop-in AutoModelForSequenceClassification checkpoint.
  • The model is less calibrated on chatbot-domain data than on the GoEmotions test split.
  • Weak domain classes include nervousness, neutral, annoyance, approval, and caring.
  • Probabilities should be treated as confidence bands, not exact probabilities.

Recommended confidence handling:

  • >= 0.50: high confidence.
  • 0.35 - 0.50: medium confidence.
  • < 0.35: low confidence; treat emotional state as unclear.

Training Data

The model was trained on GoEmotions with MELD as an auxiliary dialogue-context task. It was then validated on a small in-domain chatbot example set.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JamieYCR/goemotions-chatbot-emotion-classifier

Finetuned
(266)
this model