iMeshal commited on
Commit
ba9e01c
·
verified ·
1 Parent(s): 64fc22a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -3
README.md CHANGED
@@ -1,3 +1,111 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ar
3
+ license: apache-2.0
4
+ library_name: transformers
5
+ tags:
6
+ - sentiment-analysis
7
+ - arabic
8
+ - marbert
9
+ - twitter
10
+ - text-classification
11
+ datasets:
12
+ - mksaad/arabic-sentiment-twitter-corpus
13
+ metrics:
14
+ - accuracy
15
+ - f1
16
+ - precision
17
+ - recall
18
+ ---
19
+
20
+ # MARBERT Model for Arabic Sentiment Analysis (Positive/Negative)
21
+
22
+ This is a fine-tuned version of `UBC-NLP/MARBERTv2` for Arabic Sentiment Analysis.
23
+ The model is trained to classify Arabic text (specifically tweets) into two categories: **Positive (`LABEL_1`)** or **Negative (`LABEL_0`)**.
24
+
25
+ ## 🚀 Live Demo
26
+
27
+ You can test the model live on the Hugging Face Space:
28
+ **[https://huggingface.co/spaces/iMeshal/arabic-sentiment-app](https://huggingface.co/spaces/iMeshal/arabic-sentiment-app)**
29
+
30
+ ---
31
+
32
+ ## 📊 Model Performance
33
+
34
+ The model was trained on 80% of the training data and validated on 20%. The final evaluation was performed on a separate, unseen test set.
35
+
36
+ **Final Test Set Results (Accuracy: 94.40%)**
37
+
38
+ | Metric | Score |
39
+ | :--- | :---: |
40
+ | **Accuracy** | **94.40%** |
41
+ | F1 (Macro) | 94.40% |
42
+ | Precision (Macro) | 94.40% |
43
+ | Recall (Macro) | 94.40% |
44
+ | Loss | 0.1667 |
45
+
46
+ The model achieved its best validation accuracy of **93.4%** at Epoch 2, and `load_best_model_at_end` was used.
47
+
48
+ ---
49
+
50
+ ## 💻 Intended Use (How to use)
51
+
52
+ You can use this model directly with the `transformers` pipeline.
53
+
54
+ ```python
55
+ from transformers import pipeline
56
+
57
+ # Load the pipeline
58
+ pipe = pipeline(
59
+ "sentiment-analysis",
60
+ model="iMeshal/arabic-sentiment-classifier-marbert"
61
+ )
62
+
63
+ # Test with new texts
64
+ texts = [
65
+ "هذا المنتج رائع جداً أنصح به",
66
+ "أسوأ خدمة عملاء على الإطلاق",
67
+ "الجو اليوم جميل"
68
+ ]
69
+
70
+ results = pipe(texts)
71
+ print(results)
72
+ # Output:
73
+ # [
74
+ # {'label': 'LABEL_1', 'score': 0.99...}, # Positive
75
+ # {'label': 'LABEL_0', 'score': 0.99...}, # Negative
76
+ # {'label': 'LABEL_1', 'score': 0.98...} # Positive
77
+ # ]
78
+
79
+ ```
80
+
81
+ ## 📚 Training Data
82
+
83
+ The model was trained on the **[Arabic Sentiment Twitter Corpus](https://www.kaggle.com/datasets/mksaad/arabic-sentiment-twitter-corpus)** dataset from Kaggle.
84
+
85
+ * **Preprocessing:** Long/concatenated tweets (which appeared to be noise) were cleaned.
86
+ * **Training Set:** ~24,163 samples.
87
+ * **Validation Set:** ~6,041 samples.
88
+ * **Test Set:** ~11,508 samples.
89
+ * **Balance:** All datasets were perfectly balanced (approx. 50% Positive / 50% Negative).
90
+
91
+ ---
92
+
93
+ ## ⚙️ Training Procedure
94
+
95
+ The model was trained using the `transformers.Trainer` class with the following key hyperparameters:
96
+
97
+ * **Framework:** PyTorch
98
+ * **Base Model:** `UBC-NLP/MARBERTv2`
99
+ * **Epochs:** 3 (with Early Stopping)
100
+ * **Early Stopping:** Patience set to 2 (training stopped at Epoch 3, but Epoch 2 was the best).
101
+ * **Batch Size:** 16
102
+ * **Learning Rate:** 2e-5
103
+ * **Tokenizer:** `AutoTokenizer` (with `padding="max_length"`, `truncation=True`, `max_length=512`)
104
+
105
+ ---
106
+
107
+ ### 📞 Contact
108
+
109
+ * **Name:** Meshal AL-Qushaym
110
+ * **Email:** meshalqushim@outlook.com
111
+ * **Kaggle:** [kaggle.com/meshalfalah](https://www.kaggle.com/meshalfalah)