7beshoyarnest
/

arabic-sentiment-model

Text Classification

Generated from Trainer

Model card Files Files and versions

arabic-sentiment-model / README.md

7beshoyarnest's picture

Update README.md

7a3b8c9 verified 30 days ago

|

history blame contribute delete

3.48 kB

	---
	library_name: transformers
	base_model: aubmindlab/bert-base-arabertv02
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	- f1
	model-index:
	- name: arabic-sentiment-model
	results: []
	language:
	- ar
	pipeline_tag: text-classification
	datasets:
	- ramybaly/arsentd_lev
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# arabic-sentiment-model

	This model is a fine-tuned version of [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) on an [ramybaly/arsentd_lev](https://huggingface.co/datasets/ramybaly/arsentd_lev) dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1512
	- Accuracy: 0.9454
	- F1: 0.9454

	## Model description

	This model is a fine-tuned version of
	[aubmindlab/bert-base-arabertv02](aubmindlab/bert-base-arabertv02)
	,
	adapted for Arabic Sentiment Analysis.

	The model is trained to classify Arabic text into binary sentiment classes (Positive / Negative).
	It is suitable for analyzing opinions expressed in Modern Standard Arabic (MSA) as well as dialectal Arabic, commonly found in social media posts, product reviews, and user feedback.

	The model benefits from AraBERT’s strong contextual understanding of Arabic morphology and syntax, resulting in high classification accuracy.

	## Intended uses & limitations

	This model can be used for:

	Arabic sentiment analysis

	Social media opinion mining

	Customer feedback analysis

	Academic research and NLP experiments

	Graduation and portfolio projects

	It is designed for inference on short to medium-length Arabic texts.

	Limitations

	The model performs binary sentiment classification only (no neutral class).

	Performance may degrade on very long documents.

	## Training and evaluation data

	Training and Evaluation Data

	The model was trained and evaluated using the [ramybaly/arsentd_lev dataset](ramybaly/arsentd_lev) dataset, which consists of Arabic text labeled for sentiment polarity.

	Dataset Characteristics

	Language: Arabic

	Labels: Positive, Negative

	Text Type: Short Arabic opinions and statements

	Domains: General opinionated text

	The dataset was split into training, evaluation, and test sets following standard supervised learning practices.


	## Training procedure

	Preprocessing

	Arabic text normalization handled by AraBERT tokenizer

	Tokenization using the AraBERT v02 tokenizer

	Padding and truncation applied to ensure fixed input length

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 64
	- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 50
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| F1 \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|:------:\|
	\| 0.2134 \| 1.0 \| 588 \| 0.1978 \| 0.9274 \| 0.9274 \|
	\| 0.1571 \| 2.0 \| 1176 \| 0.1482 \| 0.9438 \| 0.9438 \|
	\| 0.1217 \| 3.0 \| 1764 \| 0.1512 \| 0.9454 \| 0.9454 \|


	### Framework versions

	- Transformers 4.57.3
	- Pytorch 2.9.0+cu126
	- Datasets 4.0.0
	- Tokenizers 0.22.1