# Sentiment Analysis Model This model is designed for sentiment analysis of English text. It predicts the sentiment of a given text as one of three classes: `positive`, `neutral`, or `negative`. The model was trained on a combination of datasets from Kaggle and Sentiment140. ## Model Description The model card describes two approaches: 1. **Baseline Model**: A classical machine learning pipeline using TF-IDF vectorization and Logistic Regression. 2. **CNN Model**: A lightweight Convolutional Neural Network (CNN) implemented in Keras. The best-performing model (based on validation macro-F1 score) is selected for inference. ### Baseline Model - **Vectorizer**: TF-IDF (word + character n-grams) - **Classifier**: Logistic Regression - **Features**: 200,000 max features, n-gram range (1, 2) ### CNN Model - **Tokenizer**: Keras Tokenizer - **Architecture**: Embedding layer -> 1D Convolution -> Global Max Pooling -> Dense layers ## Training Data The model was trained on a combination of datasets: - **Kaggle Train**: 27,477 samples - **Sentiment140 Train**: 300,000 balanced samples - **Sentiment140 Manual Test**: 516 samples The datasets were cleaned and unified into a common schema with `text` and `sentiment` columns. ## Evaluation The model was evaluated on a stratified validation split (15% of the training data). The best model was selected based on the macro-F1 score.