atdokmeci
/

sentiment_analysis_twitter_model

Model card Files Files and versions

sentiment_analysis_twitter_model / README.md

atdokmeci's picture

Update README.md

9d8289f verified 6 months ago

|

history blame contribute delete

1.4 kB

	# Sentiment Analysis Model

	This model is designed for sentiment analysis of English text. It predicts the sentiment of a given text as one of three classes: `positive`, `neutral`, or `negative`. The model was trained on a combination of datasets from Kaggle and Sentiment140.

	## Model Description

	The model card describes two approaches:
	1. Baseline Model: A classical machine learning pipeline using TF-IDF vectorization and Logistic Regression.
	2. CNN Model: A lightweight Convolutional Neural Network (CNN) implemented in Keras.

	The best-performing model (based on validation macro-F1 score) is selected for inference.

	### Baseline Model
	- Vectorizer: TF-IDF (word + character n-grams)
	- Classifier: Logistic Regression
	- Features: 200,000 max features, n-gram range (1, 2)

	### CNN Model
	- Tokenizer: Keras Tokenizer
	- Architecture: Embedding layer -> 1D Convolution -> Global Max Pooling -> Dense layers

	## Training Data

	The model was trained on a combination of datasets:
	- Kaggle Train: 27,477 samples
	- Sentiment140 Train: 300,000 balanced samples
	- Sentiment140 Manual Test: 516 samples

	The datasets were cleaned and unified into a common schema with `text` and `sentiment` columns.

	## Evaluation

	The model was evaluated on a stratified validation split (15% of the training data). The best model was selected based on the macro-F1 score.