Classical ML β€” GoEmotions Emotion Classifier

TF-IDF + OneVsRest classifiers for multi-label emotion classification on GoEmotions (28 labels).

CS6120 NLP Project β€” Northeastern University

Models

Model Micro-F1 Macro-F1 Subset Acc Hamming Acc
Logistic Regression 0.4448 0.2388 0.3156 0.9658
LinearSVC (calibrated) 0.5233 0.4266 0.3097 0.9542
XGBoost 0.5485 0.4044 0.4133 0.9622

Files

  • tfidf_vectorizer.pkl β€” TF-IDF vectorizer (15k features, bigrams, sublinear TF)
  • mlb.pkl β€” MultiLabelBinarizer (28 GoEmotions labels)
  • logistic_regression.pkl β€” OneVsRest Logistic Regression
  • linearsvc.pkl β€” OneVsRest LinearSVC (Platt-calibrated)
  • xgboost.pkl β€” OneVsRest XGBoost
  • thresholds_lr.pkl β€” per-class optimal thresholds for LR (numpy array, shape 28)
  • thresholds_linearsvc.pkl β€” per-class optimal thresholds for LinearSVC (numpy array, shape 28)

Usage

import joblib, numpy as np

tfidf      = joblib.load("tfidf_vectorizer.pkl")
mlb        = joblib.load("mlb.pkl")
model      = joblib.load("xgboost.pkl")        # or logistic_regression / linearsvc
thresholds = joblib.load("thresholds_lr.pkl")  # for LR; skip for XGBoost (use 0.30)

X = tfidf.transform(["I am so happy and grateful today!"])
proba  = model.predict_proba(X)
preds  = (proba >= thresholds).astype(int)     # shape (1, 28)
labels = mlb.classes_[preds[0].astype(bool)]
print(labels)
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train YatishW79/classical-ml-goemotions