YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Sentiment Analysis Model
This model is designed for sentiment analysis of English text. It predicts the sentiment of a given text as one of three classes: positive, neutral, or negative. The model was trained on a combination of datasets from Kaggle and Sentiment140.
Model Description
The model card describes two approaches:
- Baseline Model: A classical machine learning pipeline using TF-IDF vectorization and Logistic Regression.
- CNN Model: A lightweight Convolutional Neural Network (CNN) implemented in Keras.
The best-performing model (based on validation macro-F1 score) is selected for inference.
Baseline Model
- Vectorizer: TF-IDF (word + character n-grams)
- Classifier: Logistic Regression
- Features: 200,000 max features, n-gram range (1, 2)
CNN Model
- Tokenizer: Keras Tokenizer
- Architecture: Embedding layer -> 1D Convolution -> Global Max Pooling -> Dense layers
Training Data
The model was trained on a combination of datasets:
- Kaggle Train: 27,477 samples
- Sentiment140 Train: 300,000 balanced samples
- Sentiment140 Manual Test: 516 samples
The datasets were cleaned and unified into a common schema with text and sentiment columns.
Evaluation
The model was evaluated on a stratified validation split (15% of the training data). The best model was selected based on the macro-F1 score.