haydenpham's picture
Update README.md
6a31380 verified
---
license: mit
language:
- en
library_name: sklearn
tags:
- text-classification
- emotion-detection
- sklearn
- skops
datasets:
- custom
metrics:
- accuracy
pipeline_tag: text-classification
---
# 6 Emotions Text Classification Model
A logistic regression model for classifying text into 6 emotion categories.
## Model Description
- **Model type:** Logistic Regression with TF-IDF features
- **Language:** English
- **Task:** Multi-class text classification
- **Labels:** anger, fear, joy, love, sadness, surprise
## Training Data
This model was trained on a merged dataset from two sources:
1. **GoEmotions** (Google): A corpus of 58k Reddit comments with 27 emotion categories
- Source: [Kaggle](https://www.kaggle.com/datasets/shivamb/go-emotions-google-emotions-dataset)
- Paper: [arXiv:2005.00547](https://arxiv.org/abs/2005.00547)
2. **Emotion Dataset**: Text samples labeled with basic emotions
- Source: [Kaggle](https://www.kaggle.com/datasets/parulpandey/emotion-dataset/data)
- Paper: [EMNLP 2018](https://www.aclweb.org/anthology/D18-1404)
Labels were mapped to 6 core emotion categories for this model.
## Features
The model uses a combination of:
- **Word-level TF-IDF:** unigrams to trigrams (max 20,000 features)
- **Character-level TF-IDF:** 3-5 character n-grams (max 15,000 features)
## Training
- **Framework:** scikit-learn
- **Hyperparameter tuning:** GridSearchCV with 3-fold cross-validation
- **Class balancing:** `class_weight='balanced'`
## Performance
### Model Metrics
- **Cross-Validation Accuracy:** 0.7163
- **Test Accuracy:** 0.70
- **Training Size:** 41,974
- **Test Size:** 6,067
### Confusion Matrix
![Confusion Matrix](figures/confusionMaxtrixNormalized.png)
## Limitations
- Trained on English text; performance on other languages is not guaranteed.
- May not generalize well to formal and technical texts.
- Single-label classification (no multi-emotion detection).
- Potential biases from training data sources.
## Usage
```python
import skops.io as sio
# Load model (review untrusted types before loading)
trusted_types = [
"sklearn.pipeline.Pipeline",
"sklearn.linear_model._logistic.LogisticRegression",
"sklearn.feature_extraction.text.TfidfVectorizer",
"numpy.ndarray",
"numpy.dtype"
]
model = sio.load("6emotions_model.skops", trusted=trusted_types)
# Predict
text = "I'm so happy today!"
prediction = model.predict([text])
print(prediction) # ['joy']