germla
/

satoken

Text Classification

sentence-transformers

Eval Results (legacy)

text-embeddings-inference

Model card Files Files and versions

Satoken

This is a SetFit model trained on multilingual datasets (mentioned below) for Sentiment classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

It is utilized by Germla for it's feedback analysis tool. (specifically the Sentiment analysis feature)

For other models (specific language-basis) check here

Usage

To use this model for inference, first install the SetFit library:

python -m pip install setfit

You can then run inference as follows:

from setfit import SetFitModel

# Download from Hub and run inference
model = SetFitModel.from_pretrained("germla/satoken")
# Run inference
preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"])

Training Details

Training Data

Training Procedure

We made sure to have a balanced dataset. The model was trained on only 35% (50% for chinese) of the train split of all datasets.

Preprocessing

Basic Cleaning (removal of dups, links, mentions, hashtags, etc.)
Removal of stopwords using nltk

Speeds, Sizes, Times

The training procedure took 6hours on the NVIDIA T4 GPU.

Evaluation

Testing Data, Factors & Metrics

IMDB test split

Environmental Impact

Hardware Type: NVIDIA T4 GPU
Hours used: 6
Cloud Provider: Amazon Web Services
Compute Region: ap-south-1 (Mumbai)
Carbon Emitted: 0.39 kg co2 eq.

Downloads last month: 2

Dataset used to train germla/satoken

Evaluation results

Accuracy on imdb
test set self-reported

73.976
F1 on imdb
test set self-reported

73.167
Precision on imdb
test set self-reported

75.515
Recall on imdb
test set self-reported

70.960
Accuracy on sepidmnorozy/Russian_sentiment
self-reported

75.664
F1 on sepidmnorozy/Russian_sentiment
self-reported

83.642
Precision on sepidmnorozy/Russian_sentiment
self-reported

75.257
Recall on sepidmnorozy/Russian_sentiment
self-reported

94.130