7beshoyarnest's picture
Update README.md
7a3b8c9 verified
metadata
library_name: transformers
base_model: aubmindlab/bert-base-arabertv02
tags:
  - generated_from_trainer
metrics:
  - accuracy
  - f1
model-index:
  - name: arabic-sentiment-model
    results: []
language:
  - ar
pipeline_tag: text-classification
datasets:
  - ramybaly/arsentd_lev

arabic-sentiment-model

This model is a fine-tuned version of aubmindlab/bert-base-arabertv02 on an ramybaly/arsentd_lev dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1512
  • Accuracy: 0.9454
  • F1: 0.9454

Model description

This model is a fine-tuned version of aubmindlab/bert-base-arabertv02 , adapted for Arabic Sentiment Analysis.

The model is trained to classify Arabic text into binary sentiment classes (Positive / Negative). It is suitable for analyzing opinions expressed in Modern Standard Arabic (MSA) as well as dialectal Arabic, commonly found in social media posts, product reviews, and user feedback.

The model benefits from AraBERT’s strong contextual understanding of Arabic morphology and syntax, resulting in high classification accuracy.

Intended uses & limitations

This model can be used for:

Arabic sentiment analysis

Social media opinion mining

Customer feedback analysis

Academic research and NLP experiments

Graduation and portfolio projects

It is designed for inference on short to medium-length Arabic texts.

Limitations

The model performs binary sentiment classification only (no neutral class).

Performance may degrade on very long documents.

Training and evaluation data

Training and Evaluation Data

The model was trained and evaluated using the ramybaly/arsentd_lev dataset dataset, which consists of Arabic text labeled for sentiment polarity.

Dataset Characteristics

Language: Arabic

Labels: Positive, Negative

Text Type: Short Arabic opinions and statements

Domains: General opinionated text

The dataset was split into training, evaluation, and test sets following standard supervised learning practices.

Training procedure

Preprocessing

Arabic text normalization handled by AraBERT tokenizer

Tokenization using the AraBERT v02 tokenizer

Padding and truncation applied to ensure fixed input length

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Accuracy F1
0.2134 1.0 588 0.1978 0.9274 0.9274
0.1571 2.0 1176 0.1482 0.9438 0.9438
0.1217 3.0 1764 0.1512 0.9454 0.9454

Framework versions

  • Transformers 4.57.3
  • Pytorch 2.9.0+cu126
  • Datasets 4.0.0
  • Tokenizers 0.22.1