haydenpham
/

6emotions-classifier

Text Classification

emotion-detection

Model card Files Files and versions

6emotions-classifier / README.md

haydenpham's picture

Update README.md

6a31380 verified about 2 months ago

|

history blame contribute delete

2.47 kB

	---
	license: mit
	language:
	- en
	library_name: sklearn
	tags:
	- text-classification
	- emotion-detection
	- sklearn
	- skops
	datasets:
	- custom
	metrics:
	- accuracy
	pipeline_tag: text-classification
	---

	# 6 Emotions Text Classification Model

	A logistic regression model for classifying text into 6 emotion categories.

	## Model Description

	- Model type: Logistic Regression with TF-IDF features
	- Language: English
	- Task: Multi-class text classification
	- Labels: anger, fear, joy, love, sadness, surprise

	## Training Data

	This model was trained on a merged dataset from two sources:

	1. GoEmotions (Google): A corpus of 58k Reddit comments with 27 emotion categories
	- Source: [Kaggle](https://www.kaggle.com/datasets/shivamb/go-emotions-google-emotions-dataset)
	- Paper: [arXiv:2005.00547](https://arxiv.org/abs/2005.00547)

	2. Emotion Dataset: Text samples labeled with basic emotions
	- Source: [Kaggle](https://www.kaggle.com/datasets/parulpandey/emotion-dataset/data)
	- Paper: [EMNLP 2018](https://www.aclweb.org/anthology/D18-1404)

	Labels were mapped to 6 core emotion categories for this model.

	## Features

	The model uses a combination of:
	- Word-level TF-IDF: unigrams to trigrams (max 20,000 features)
	- Character-level TF-IDF: 3-5 character n-grams (max 15,000 features)

	## Training

	- Framework: scikit-learn
	- Hyperparameter tuning: GridSearchCV with 3-fold cross-validation
	- Class balancing: `class_weight='balanced'`

	## Performance

	### Model Metrics
	- Cross-Validation Accuracy: 0.7163
	- Test Accuracy: 0.70
	- Training Size: 41,974
	- Test Size: 6,067

	### Confusion Matrix
	![Confusion Matrix](figures/confusionMaxtrixNormalized.png)

	## Limitations
	- Trained on English text; performance on other languages is not guaranteed.
	- May not generalize well to formal and technical texts.
	- Single-label classification (no multi-emotion detection).
	- Potential biases from training data sources.

	## Usage

	```python
	import skops.io as sio

	# Load model (review untrusted types before loading)
	trusted_types = [
	"sklearn.pipeline.Pipeline",
	"sklearn.linear_model._logistic.LogisticRegression",
	"sklearn.feature_extraction.text.TfidfVectorizer",
	"numpy.ndarray",
	"numpy.dtype"
	]

	model = sio.load("6emotions_model.skops", trusted=trusted_types)

	# Predict
	text = "I'm so happy today!"
	prediction = model.predict([text])
	print(prediction) # ['joy']