metadata
license: mit
language:
- en
library_name: sklearn
tags:
- text-classification
- emotion-detection
- sklearn
- skops
datasets:
- custom
metrics:
- accuracy
pipeline_tag: text-classification
6 Emotions Text Classification Model
A logistic regression model for classifying text into 6 emotion categories.
Model Description
- Model type: Logistic Regression with TF-IDF features
- Language: English
- Task: Multi-class text classification
- Labels: anger, fear, joy, love, sadness, surprise
Training Data
This model was trained on a merged dataset from two sources:
GoEmotions (Google): A corpus of 58k Reddit comments with 27 emotion categories
- Source: Kaggle
- Paper: arXiv:2005.00547
Emotion Dataset: Text samples labeled with basic emotions
- Source: Kaggle
- Paper: EMNLP 2018
Labels were mapped to 6 core emotion categories for this model.
Features
The model uses a combination of:
- Word-level TF-IDF: unigrams to trigrams (max 20,000 features)
- Character-level TF-IDF: 3-5 character n-grams (max 15,000 features)
Training
- Framework: scikit-learn
- Hyperparameter tuning: GridSearchCV with 3-fold cross-validation
- Class balancing:
class_weight='balanced'
Performance
Model Metrics
- Cross-Validation Accuracy: 0.7163
- Test Accuracy: 0.70
- Training Size: 41,974
- Test Size: 6,067
Confusion Matrix
Limitations
- Trained on English text; performance on other languages is not guaranteed.
- May not generalize well to formal and technical texts.
- Single-label classification (no multi-emotion detection).
- Potential biases from training data sources.
Usage
import skops.io as sio
# Load model (review untrusted types before loading)
trusted_types = [
"sklearn.pipeline.Pipeline",
"sklearn.linear_model._logistic.LogisticRegression",
"sklearn.feature_extraction.text.TfidfVectorizer",
"numpy.ndarray",
"numpy.dtype"
]
model = sio.load("6emotions_model.skops", trusted=trusted_types)
# Predict
text = "I'm so happy today!"
prediction = model.predict([text])
print(prediction) # ['joy']
