marbert-complaint-sentiment

Fine-tuned UBC-NLP/MARBERTv2 for 3-class sentiment on a gold-standard Arabic complaint/review subset curated from the GLARE corpus (e-commerce / user-review domain; balanced classes, manual annotation).
(The Hub may still show “None dataset” in an auto-generated line from Trainer—that line is superseded by this description.)

Evaluation set (held-out):

Loss: 0.5762
Accuracy: 0.76
Precision: 0.7625
Recall: 0.76
F1: 0.7593

Model description

Task: Sentiment of short Arabic complaint-style text: NEG (negative), NEU (neutral), POS (positive).
Label ids (should match config.json): NEG→0, NEU→1, POS→2.
Base model: MARBERTv2 (multi-dialect Arabic BERT); cite Abdul-Mageed et al. (ACL 2020) for MARBERT.
Companion paper & code: GitHub YOUSEF-ysfxjo/complaint-xai-fl-research (manuscript: paper/research_v2.tex).

Intended uses & limitations

Uses: Triage or analytics for Arabic e-commerce complaints (Saudi/Gulf-style text, MSA + dialect + light code-mixing).

Limitations: Not for legal/moderation decisions without human review; optimized for this label schema and domain; max length 128 tokens in training (long texts truncated); performance may drop on other genres or dialects.

Training and evaluation data

Source: GLARE (large-scale Arabic reviews; see Ghanbari et al., GLARE, arXiv:2412.15259).
This checkpoint: Project gold sentiment split — 10,000 samples per class (30,000 total), balanced. Exact CSV column names match the training pipeline in the companion repository.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 300
num_epochs: 5
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
0.5539	1.0	844	0.6033	0.7577	0.7601	0.7577	0.7574
0.5018	2.0	1688	0.5762	0.76	0.7625	0.76	0.7593
0.4266	3.0	2532	0.6210	0.756	0.7567	0.756	0.7551
0.3449	4.0	3376	0.6901	0.75	0.7532	0.75	0.7484
0.3056	5.0	4220	0.7335	0.749	0.7516	0.749	0.7479

Framework versions

Transformers 4.53.3
Pytorch 2.6.0+cu124
Datasets 4.4.1
Tokenizers 0.21.2

Inference example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "Ysfxjo/marbert-complaint-sentiment"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
text = "الشحن متأخر والتعامل سيء"
inputs = tok(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
    pred = model(**inputs).logits.argmax(-1).item()
print(model.config.id2label[pred])

Downloads last month: 81

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for Ysfxjo/marbert-complaint-sentiment

Base model

UBC-NLP/MARBERTv2

Finetuned

(38)

this model

Paper for Ysfxjo/marbert-complaint-sentiment

GLARE: Google Apps Arabic Reviews Dataset

Paper • 2412.15259 • Published Dec 16, 2024