๐ญ Levantine Arabic Sentiment Classifier (Ordinal MARBERTv2)
This model is a fine-tuned version of MARBERTv2, designed to predict the sentiment of Levantine Arabic tweets (Jordanian, Lebanese, Palestinian, Syrian).
Technical Highlight: This model was trained using an Ordinal Loss Function (Mean Squared Error combined with Cross-Entropy). This makes the model "distance-aware," meaning it heavily penalizes extreme mistakes (like confusing a highly positive tweet for a highly negative one). This makes its predictions far more reliable in edge cases!
๐ Performance
| Metric | Score | Description |
|---|---|---|
| Accuracy | 79.25% | Overall correctness on the test set. |
| F1 (Macro) | 0.7635 | The balanced F1 score across all 3 classes. |
๐ Labels
| ID | Label | Meaning |
|---|---|---|
| 0 | Negative ๐ | Anger, complaints, sadness, or frustration. |
| 1 | Neutral ๐ | Objective facts, mixed emotions, or ambiguous statements. |
| 2 | Positive ๐ | Joy, praise, excitement, or satisfaction. |
๐ How to Use (Python)
Because this is a standard 3-class model, you can easily load it using Hugging Face's built-in pipeline.
from transformers import pipeline
# 1. Load Pipeline
model_id = "amitca71/marabert2-levantine-sentiment"
classifier = pipeline("text-classification", model=model_id)
def predict_sentiment(text):
# Get the top prediction
result = classifier(text)[0]
# Format the output cleanly
return {"text": text, "label": result['label'], "confidence": round(result['score'], 4)}
# 2. Test Examples
print(predict_sentiment("ุงูุฌู ุงูููู
ุจูุนูุฏ! ุทุงูุนูู ู
ุดูุงุฑ")) # Should be Positive
print(predict_sentiment("ูุงููู ุทูุช ุฑูุญู ู
ู ูุงูุฒุญู
ุฉุ ุดู ุจููุฑู")) # Should be Negative
print(predict_sentiment("ูุตูุช ุนุงูุจูุช ู
ู ุดูู.")) # Should be Neutral
โ ๏ธ Limitations
- Dialect Focus: Optimized heavily for Levantine Twitter. It may underperform or misunderstand idioms in Egyptian, Gulf, or Maghrebi dialects.
- The "Neutral" Bottleneck: Like most sentiment models, detecting true "Neutral" text is the most challenging, as human annotators often mix objective facts with subtle sarcasm in this category.
- Arabizi: While MARBERTv2 has some exposure to Arabizi (Arabic written in English/Latin letters), this model performs best on native Arabic script.
- Downloads last month
- 66
Evaluation results
- accuracy on ArSenTD-LEVself-reported79.25%
- f1_macro on ArSenTD-LEVself-reported0.763