marbert-complaint-sentiment

Fine-tuned UBC-NLP/MARBERTv2 for 3-class sentiment on a gold-standard Arabic complaint/review subset curated from the GLARE corpus (e-commerce / user-review domain; balanced classes, manual annotation).
(The Hub may still show “None dataset” in an auto-generated line from Trainer—that line is superseded by this description.)

Evaluation set (held-out):

  • Loss: 0.5762
  • Accuracy: 0.76
  • Precision: 0.7625
  • Recall: 0.76
  • F1: 0.7593

Model description

  • Task: Sentiment of short Arabic complaint-style text: NEG (negative), NEU (neutral), POS (positive).
  • Label ids (should match config.json): NEG→0, NEU→1, POS→2.
  • Base model: MARBERTv2 (multi-dialect Arabic BERT); cite Abdul-Mageed et al. (ACL 2020) for MARBERT.
  • Companion paper & code: GitHub YOUSEF-ysfxjo/complaint-xai-fl-research (manuscript: paper/research_v2.tex).

Intended uses & limitations

Uses: Triage or analytics for Arabic e-commerce complaints (Saudi/Gulf-style text, MSA + dialect + light code-mixing).

Limitations: Not for legal/moderation decisions without human review; optimized for this label schema and domain; max length 128 tokens in training (long texts truncated); performance may drop on other genres or dialects.

Training and evaluation data

  • Source: GLARE (large-scale Arabic reviews; see Ghanbari et al., GLARE, arXiv:2412.15259).
  • This checkpoint: Project gold sentiment split — 10,000 samples per class (30,000 total), balanced. Exact CSV column names match the training pipeline in the companion repository.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 300
  • num_epochs: 5
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
0.5539 1.0 844 0.6033 0.7577 0.7601 0.7577 0.7574
0.5018 2.0 1688 0.5762 0.76 0.7625 0.76 0.7593
0.4266 3.0 2532 0.6210 0.756 0.7567 0.756 0.7551
0.3449 4.0 3376 0.6901 0.75 0.7532 0.75 0.7484
0.3056 5.0 4220 0.7335 0.749 0.7516 0.749 0.7479

Framework versions

  • Transformers 4.53.3
  • Pytorch 2.6.0+cu124
  • Datasets 4.4.1
  • Tokenizers 0.21.2

Inference example

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "Ysfxjo/marbert-complaint-sentiment"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
text = "الشحن متأخر والتعامل سيء"
inputs = tok(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
    pred = model(**inputs).logits.argmax(-1).item()
print(model.config.id2label[pred])
Downloads last month
81
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ysfxjo/marbert-complaint-sentiment

Finetuned
(38)
this model

Paper for Ysfxjo/marbert-complaint-sentiment