| # Sentiment Analysis Models | |
| This repository contains two logistic regression models trained to predict sentiment scores. | |
| ## Model Details | |
| - Base embedding model: mixedbread-ai/mxbai-embed-large-v1 | |
| - Architecture: LogisticRegression (scikit-learn) | |
| - Training data: Custom sentiment dataset with dual expert annotations | |
| - Data split: 70% training, 15% development, 15% test | |
| ## Performance Metrics | |
| ### Development Set | |
| #### Against Expert 1: | |
| - Exact match: 49.27% | |
| - Within 1 level: 96.05% | |
| #### Against Expert 2: | |
| - Exact match: 41.00% | |
| - Within 1 level: 93.05% | |
| ### Test Set | |
| #### Against Expert 1: | |
| - Exact match: 49.32% | |
| - Within 1 level: 94.93% | |
| #### Against Expert 2: | |
| - Exact match: 41.44% | |
| - Within 1 level: 91.51% | |
| ## Usage | |
| See `inference.py` for an example of how to use these models to predict sentiment for new text. | |
| ## Model Files | |
| - `model1.joblib`: Model trained on Expert 1 annotations | |
| - `model2.joblib`: Model trained on Expert 2 annotations | |
| ## Data Files | |
| - `dev_results.csv`: Complete predictions on development set | |
| - `test_results.csv`: Complete predictions on test set | |