--- library_name: transformers base_model: aubmindlab/bert-base-arabertv02 tags: - generated_from_trainer metrics: - accuracy - f1 model-index: - name: arabic-sentiment-model results: [] language: - ar pipeline_tag: text-classification datasets: - ramybaly/arsentd_lev --- # arabic-sentiment-model This model is a fine-tuned version of [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) on an [ramybaly/arsentd_lev](https://huggingface.co/datasets/ramybaly/arsentd_lev) dataset. It achieves the following results on the evaluation set: - Loss: 0.1512 - Accuracy: 0.9454 - F1: 0.9454 ## Model description This model is a fine-tuned version of [aubmindlab/bert-base-arabertv02](aubmindlab/bert-base-arabertv02) , adapted for Arabic Sentiment Analysis. The model is trained to classify Arabic text into binary sentiment classes (Positive / Negative). It is suitable for analyzing opinions expressed in Modern Standard Arabic (MSA) as well as dialectal Arabic, commonly found in social media posts, product reviews, and user feedback. The model benefits from AraBERT’s strong contextual understanding of Arabic morphology and syntax, resulting in high classification accuracy. ## Intended uses & limitations This model can be used for: Arabic sentiment analysis Social media opinion mining Customer feedback analysis Academic research and NLP experiments Graduation and portfolio projects It is designed for inference on short to medium-length Arabic texts. Limitations The model performs binary sentiment classification only (no neutral class). Performance may degrade on very long documents. ## Training and evaluation data Training and Evaluation Data The model was trained and evaluated using the [ramybaly/arsentd_lev dataset](ramybaly/arsentd_lev) dataset, which consists of Arabic text labeled for sentiment polarity. Dataset Characteristics Language: Arabic Labels: Positive, Negative Text Type: Short Arabic opinions and statements Domains: General opinionated text The dataset was split into training, evaluation, and test sets following standard supervised learning practices. ## Training procedure Preprocessing Arabic text normalization handled by AraBERT tokenizer Tokenization using the AraBERT v02 tokenizer Padding and truncation applied to ensure fixed input length ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 64 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 50 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:| | 0.2134 | 1.0 | 588 | 0.1978 | 0.9274 | 0.9274 | | 0.1571 | 2.0 | 1176 | 0.1482 | 0.9438 | 0.9438 | | 0.1217 | 3.0 | 1764 | 0.1512 | 0.9454 | 0.9454 | ### Framework versions - Transformers 4.57.3 - Pytorch 2.9.0+cu126 - Datasets 4.0.0 - Tokenizers 0.22.1