Lithuanian E-commerce Sentiment Classifier — MNB (TF-IDF, binary)

A Multinomial Naïve Bayes classifier trained from scratch on 47,131 Lithuanian e-commerce reviews. Predicts binary sentiment polarity (positive / negative).

Model details

Developed by: Austėja Rušėnaitė
Algorithm: sklearn.naive_bayes.MultinomialNB, alpha = 1.0 (Laplace smoothing).
Features: TF-IDF (5,000 features, unigrams + bigrams, min_df = 2, sublinear_tf = True) plus four engineered features (exclamation_count, review_word_count, avg_word_length, capital_count) MinMax-scaled to [0, 1].
Labels: binary — positive and negative. The corpus is naturally three-class (positive / neutral / negative) but the neutral class was dropped at training time.
Training data size: 47,131 reviews (positive 37,269; negative 9,862).
Class handling: sample_weight = compute_sample_weight("balanced", y). SMOTE was deliberately not used because interpolating in sparse TF-IDF space produces vectors that do not correspond to plausible Lithuanian sentences.
Language: Lithuanian (lt).
Domain: Lithuanian e-commerce reviews.
Saved with scikit-learn: 1.8.0.
License: Apache 2.0

Files in this repository

File	Purpose
`mnb_sentiment_binary_model.joblib`	trained MNB classifier
`mnb_tfidf_binary_vectorizer.joblib`	fitted `TfidfVectorizer`
`mnb_features_binary_scaler.joblib`	fitted `MinMaxScaler` for engineered features
`README.md`	this card

Intended use

Polarity classification of Lithuanian e-commerce reviews — for example, as a feature for downstream aggregation across the set of reviews available for a given merchant.

Three-class sentiment is not supported by this checkpoint. Domains other than Lithuanian e-commerce reviews are out of scope: the vocabulary and engineered features are dataset-specific.

Training procedure

Two-stage preprocessing applied to the raw review text before vectorisation:

Lower-cased, Lithuanian-letters-only normalisation ([^a-ząčęėįšųūž\s] stripped).
spaCy lt_core_news_md lemmatisation and stop-word removal.

The four engineered features are computed from the raw (pre-normalisation) text so that exclamation_count, capital_count and friends are preserved.

The TF-IDF matrix is hstacked with the scaled engineered features to produce a 5,004-dimensional sparse input matrix, on which MNB is fitted with balanced sample weights.

Evaluation

Reported on 5-fold stratified cross-validation over the full labelled dataset (every review appears as a test instance exactly once). No held-out split.

Metric	Value
Accuracy	0.8952 ± 0.0039
macro-F1 (headline)	0.8618 ± 0.0043
weighted-F1	0.9013 ± 0.0034
macro-precision	0.8320 ± 0.0044
macro-recall	0.9208 ± 0.0016
ROC-AUC	0.9743 ± 0.0011
Average precision	0.9933 ± 0.0003

Per-class breakdown (mean ± std across folds):

Class	Precision	Recall	F1-score
Negative	0.6746 ± 0.0095	0.9647 ± 0.0023	0.7939 ± 0.0057
Positive	0.9895 ± 0.0006	0.8768 ± 0.0055	0.9297 ± 0.0028

Aggregated confusion matrix across all five folds:

	Predicted: Negative	Predicted: Positive
Actual: Negative	9,514	348
Actual: Positive	4,592	32,677

Limitations and caveats

Binary scheme. The neutral class was dropped because Lithuanian three-class sentiment is dominated by neutral-class confusion. Reviews whose sentiment is genuinely mixed or neutral will be forced into the closer of the two binary classes.
Class imbalance. The corpus is ~79 % positive / ~21 % negative, and the model is trained with balanced sample weights. The resulting operating point has high recall on the minority class (0.9647) at the cost of lower precision (0.6746). Downstream consumers requiring a different operating point can move along the precision–recall curve without retraining.
Domain specificity. The TF-IDF vocabulary is fitted on Lithuanian e-commerce review text. Application to other text genres is not recommended without retraining.
No language detection. Inputs in languages other than Lithuanian will be processed by the same preprocessing pipeline and will produce nonsense predictions. Language detection is the responsibility of the caller.
Bag-of-words limitations. As a bag-of-words model, the classifier does not represent word order beyond the bigram features and is not sensitive to negation scope. Reviews of the form "I expected it to be terrible but it was actually excellent" can be misclassified.

Downloads last month: -; Downloads are not tracked for this model. How to track

Evaluation results

macro-F1 (5-fold CV)
self-reported

0.862
Accuracy (5-fold CV)
self-reported

0.895
ROC-AUC (5-fold CV)
self-reported

0.974