metadata
language: en
license: mit
library_name: scikit-learn
tags:
- sentiment-analysis
- text-classification
- scikit-learn
- sentence-transformers
datasets:
- custom_sentiment_dataset
metrics:
- accuracy
Sentiment Analysis Model
This model predicts sentiment scores based on text input. It uses sentence embeddings from BAAI/bge-large-en-v1.5 and logistic regression classifiers.
Model Description
This repository contains two logistic regression models trained to predict sentiment scores based on text embeddings. The models were trained on a custom dataset with annotations from two different experts.
Model Architecture
- Base embedding model: BAAI/bge-large-en-v1.5
- Classifier: LogisticRegression (scikit-learn)
- Final prediction: Average of both model predictions, rounded to nearest integer
Intended Use and Limitations
The model is designed for sentiment analysis tasks. The model works best with English text similar to the training data.
Training and Evaluation Data
The model was trained on a custom dataset with:
- 70% training data
- 15% development data
- 15% test data
Each sample has annotations from two human experts.
Evaluation Results
See README.md for detailed performance metrics on both development and test sets.
Using the Models
import joblib
import numpy as np
from sentence_transformers import SentenceTransformer
# Load the models
model1 = joblib.load('model1.joblib')
model2 = joblib.load('model2.joblib')
embedder = SentenceTransformer('BAAI/bge-large-en-v1.5')
def predict_sentiment(text):
embedding = embedder.encode([text])
pred1 = model1.predict(embedding)[0]
pred2 = model2.predict(embedding)[0]
final_prediction = np.round((pred1 + pred2) / 2).astype(int)
return final_prediction