๐ŸŽญ Levantine Arabic Sentiment Classifier (Ordinal MARBERTv2)

This model is a fine-tuned version of MARBERTv2, designed to predict the sentiment of Levantine Arabic tweets (Jordanian, Lebanese, Palestinian, Syrian).

Technical Highlight: This model was trained using an Ordinal Loss Function (Mean Squared Error combined with Cross-Entropy). This makes the model "distance-aware," meaning it heavily penalizes extreme mistakes (like confusing a highly positive tweet for a highly negative one). This makes its predictions far more reliable in edge cases!

๐Ÿ“Š Performance

Metric Score Description
Accuracy 79.25% Overall correctness on the test set.
F1 (Macro) 0.7635 The balanced F1 score across all 3 classes.

๐Ÿ“– Labels

ID Label Meaning
0 Negative ๐Ÿ˜  Anger, complaints, sadness, or frustration.
1 Neutral ๐Ÿ˜ Objective facts, mixed emotions, or ambiguous statements.
2 Positive ๐Ÿ˜ƒ Joy, praise, excitement, or satisfaction.

๐Ÿš€ How to Use (Python)

Because this is a standard 3-class model, you can easily load it using Hugging Face's built-in pipeline.

from transformers import pipeline

# 1. Load Pipeline
model_id = "amitca71/marabert2-levantine-sentiment"
classifier = pipeline("text-classification", model=model_id)

def predict_sentiment(text):
    # Get the top prediction
    result = classifier(text)[0]

    # Format the output cleanly
    return {"text": text, "label": result['label'], "confidence": round(result['score'], 4)}

# 2. Test Examples
print(predict_sentiment("ุงู„ุฌูˆ ุงู„ูŠูˆู… ุจูŠุนู‚ุฏ! ุทุงู„ุนูŠู† ู…ุดูˆุงุฑ"))            # Should be Positive
print(predict_sentiment("ูˆุงู„ู„ู‡ ุทู‚ุช ุฑูˆุญูŠ ู…ู† ู‡ุงู„ุฒุญู…ุฉุŒ ุดูŠ ุจูŠู‚ุฑู"))        # Should be Negative
print(predict_sentiment("ูˆุตู„ุช ุนุงู„ุจูŠุช ู…ู† ุดูˆูŠ."))                       # Should be Neutral

โš ๏ธ Limitations

  • Dialect Focus: Optimized heavily for Levantine Twitter. It may underperform or misunderstand idioms in Egyptian, Gulf, or Maghrebi dialects.
  • The "Neutral" Bottleneck: Like most sentiment models, detecting true "Neutral" text is the most challenging, as human annotators often mix objective facts with subtle sarcasm in this category.
  • Arabizi: While MARBERTv2 has some exposure to Arabizi (Arabic written in English/Latin letters), this model performs best on native Arabic script.
Downloads last month
66
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results