iMeshal
/

arabic-sentiment-classifier-marbert

Text Classification

sentiment-analysis

Model card Files Files and versions

arabic-sentiment-classifier-marbert / README.md

iMeshal's picture

Update README.md

ba9e01c verified 2 months ago

|

history blame contribute delete

3.05 kB

	---
	language: ar
	license: apache-2.0
	library_name: transformers
	tags:
	- sentiment-analysis
	- arabic
	- marbert
	- twitter
	- text-classification
	datasets:
	- mksaad/arabic-sentiment-twitter-corpus
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	---

	# MARBERT Model for Arabic Sentiment Analysis (Positive/Negative)

	This is a fine-tuned version of `UBC-NLP/MARBERTv2` for Arabic Sentiment Analysis.
	The model is trained to classify Arabic text (specifically tweets) into two categories: Positive (`LABEL_1`) or Negative (`LABEL_0`).

	## 🚀 Live Demo

	You can test the model live on the Hugging Face Space:
	[https://huggingface.co/spaces/iMeshal/arabic-sentiment-app](https://huggingface.co/spaces/iMeshal/arabic-sentiment-app)

	---

	## 📊 Model Performance

	The model was trained on 80% of the training data and validated on 20%. The final evaluation was performed on a separate, unseen test set.

	Final Test Set Results (Accuracy: 94.40%)

	\| Metric \| Score \|
	\| :--- \| :---: \|
	\| Accuracy \| 94.40% \|
	\| F1 (Macro) \| 94.40% \|
	\| Precision (Macro) \| 94.40% \|
	\| Recall (Macro) \| 94.40% \|
	\| Loss \| 0.1667 \|

	The model achieved its best validation accuracy of 93.4% at Epoch 2, and `load_best_model_at_end` was used.

	---

	## 💻 Intended Use (How to use)

	You can use this model directly with the `transformers` pipeline.

	```python
	from transformers import pipeline

	# Load the pipeline
	pipe = pipeline(
	"sentiment-analysis",
	model="iMeshal/arabic-sentiment-classifier-marbert"
	)

	# Test with new texts
	texts = [
	"هذا المنتج رائع جداً أنصح به",
	"أسوأ خدمة عملاء على الإطلاق",
	"الجو اليوم جميل"
	]

	results = pipe(texts)
	print(results)
	# Output:
	# [
	# {'label': 'LABEL_1', 'score': 0.99...}, # Positive
	# {'label': 'LABEL_0', 'score': 0.99...}, # Negative
	# {'label': 'LABEL_1', 'score': 0.98...} # Positive
	# ]

	```

	## 📚 Training Data

	The model was trained on the [Arabic Sentiment Twitter Corpus](https://www.kaggle.com/datasets/mksaad/arabic-sentiment-twitter-corpus) dataset from Kaggle.

	* Preprocessing: Long/concatenated tweets (which appeared to be noise) were cleaned.
	* Training Set: ~24,163 samples.
	* Validation Set: ~6,041 samples.
	* Test Set: ~11,508 samples.
	* Balance: All datasets were perfectly balanced (approx. 50% Positive / 50% Negative).

	---

	## ⚙️ Training Procedure

	The model was trained using the `transformers.Trainer` class with the following key hyperparameters:

	* Framework: PyTorch
	* Base Model: `UBC-NLP/MARBERTv2`
	* Epochs: 3 (with Early Stopping)
	* Early Stopping: Patience set to 2 (training stopped at Epoch 3, but Epoch 2 was the best).
	* Batch Size: 16
	* Learning Rate: 2e-5
	* Tokenizer: `AutoTokenizer` (with `padding="max_length"`, `truncation=True`, `max_length=512`)

	---

	### 📞 Contact

	* Name: Meshal AL-Qushaym
	* Email: meshalqushim@outlook.com
	* Kaggle: [kaggle.com/meshalfalah](https://www.kaggle.com/meshalfalah)