Classical ML β GoEmotions Emotion Classifier
TF-IDF + OneVsRest classifiers for multi-label emotion classification on GoEmotions (28 labels).
CS6120 NLP Project β Northeastern University
Models
| Model | Micro-F1 | Macro-F1 | Subset Acc | Hamming Acc |
|---|---|---|---|---|
| Logistic Regression | 0.4448 | 0.2388 | 0.3156 | 0.9658 |
| LinearSVC (calibrated) | 0.5233 | 0.4266 | 0.3097 | 0.9542 |
| XGBoost | 0.5485 | 0.4044 | 0.4133 | 0.9622 |
Files
tfidf_vectorizer.pklβ TF-IDF vectorizer (15k features, bigrams, sublinear TF)mlb.pklβ MultiLabelBinarizer (28 GoEmotions labels)logistic_regression.pklβ OneVsRest Logistic Regressionlinearsvc.pklβ OneVsRest LinearSVC (Platt-calibrated)xgboost.pklβ OneVsRest XGBoostthresholds_lr.pklβ per-class optimal thresholds for LR (numpy array, shape 28)thresholds_linearsvc.pklβ per-class optimal thresholds for LinearSVC (numpy array, shape 28)
Usage
import joblib, numpy as np
tfidf = joblib.load("tfidf_vectorizer.pkl")
mlb = joblib.load("mlb.pkl")
model = joblib.load("xgboost.pkl") # or logistic_regression / linearsvc
thresholds = joblib.load("thresholds_lr.pkl") # for LR; skip for XGBoost (use 0.30)
X = tfidf.transform(["I am so happy and grateful today!"])
proba = model.predict_proba(X)
preds = (proba >= thresholds).astype(int) # shape (1, 28)
labels = mlb.classes_[preds[0].astype(bool)]
print(labels)
- Downloads last month
- -