A2NLP at StanceNakba 2026: AraBERT-Based Arabic Stance Detection (Subtask B)

This repository contains the official submission of Team A2NLP to the StanceNakba 2026 Shared Task โ€“ Subtask B (Topic-Based Stance Detection), co-located with LREC-COLING 2026.

Base model: aubmindlab/bert-base-arabertv02-twitter
Best Validation Macro-F1: 0.8434

Team A2NLP

A2NLP is a research team focusing on Arabic Natural Language Processing, with interests in stance detection, political discourse analysis, and transformer-based modeling.

A2NLP-StanceNakba2026-SubtaskB-AraBERTv02-Twitter

This model is a fine-tuned version of aubmindlab/bert-base-arabertv02-twitter on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4374
  • Accuracy: 0.8452
  • Macro F1: 0.8434
  • Weighted F1: 0.8443
  • F1 Pro: 0.8870
  • F1 Against: 0.8346
  • F1 Neutral: 0.8085

Model Description

This model performs Arabic stance classification with three labels:

  • pro

  • against

  • neutral

The architecture is based on BERT-base and fine-tuned using a prompt-based input formulation that explicitly conditions stance prediction on the topic.

Input Format

During training and inference, each instance is formatted as:

ุงู„ู‡ุฏู: {topic} [SEP] ุงู„ู…ูˆู‚ู ู…ู†: {sentence}

This prompt-based concatenation strategy was used to explicitly inject the topic context into the transformer encoder.

The model outputs a probability distribution over the three stance labels using a softmax classification head.

Intended Uses & Limitations

Intended Use

  • Arabic stance detection in social media text.

  • Topic-conditioned stance classification.

  • Research and shared-task benchmarking.

The model is particularly suited for:

  • Political discourse analysis.

  • Arabic Twitter stance modeling.

  • Experimental NLP research on stance detection.

Limitations

  • The model was trained on Arabic Twitter-style text and may not generalize well to:

    • Formal Arabic prose

    • Long documents

    • Non-political domains

  • Sensitive to distribution shift.

  • Performance may degrade on unseen topics.

  • No external data augmentation was used.

This model should not be used for high-stakes automated decision-making.

Training and Evaluation Data

The model was trained on the official StanceNakba Subtask B train/validation dataset provided by the shared task organizers.

  • Language: Arabic

  • Domain: Social media (Twitter-style text)

  • Labels: pro, against, neutral

No external labeled data were used.

Data Processing

The following preprocessing steps were applied:

  • Emoji normalization using emoji.demojize

  • Removal of non-Arabic characters

  • Removal of URLs, mentions, and hashtags

  • Diacritics removal

  • Arabic normalization:

    • Alef variants โ†’ ุง

    • ู‰ โ†’ ูŠ

    • ุฉ โ†’ ู‡

    • Removal of tatweel

  • Whitespace normalization

  • Duplicate sentence removal

  • Label encoding:

    {"pro": 0, "against": 1, "neutral": 2}

Training procedure

Cross-Validation Strategy

The model was trained using:

  • 5-fold Stratified Cross-Validation

  • StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

This repository corresponds to Fold 4, which achieved the best validation macro-F1 score.


Loss Function

To address class imbalance, weighted cross-entropy loss was used.

Class weights were computed using:

compute_class_weight(class_weight="balanced")

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy Macro F1 Weighted F1 F1 Pro F1 Against F1 Neutral
0.9856 1.1628 50 0.7085 0.7321 0.7342 0.7334 0.7963 0.6723 0.7339
0.6344 2.3256 100 0.4542 0.8333 0.8330 0.8334 0.8571 0.8254 0.8163
0.4127 3.4884 150 0.4393 0.8452 0.8434 0.8443 0.8870 0.8346 0.8085
0.2872 4.6512 200 0.4369 0.8274 0.8265 0.8276 0.8727 0.8189 0.7879
0.2028 5.8140 250 0.4617 0.8333 0.8331 0.8332 0.8522 0.8224 0.8246

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2

Reproducibility

  • 5-fold Stratified Cross-Validation
  • Seed: 42
  • Weighted Cross-Entropy Loss
  • Early Stopping (patience=2)
  • Best checkpoint selected by Macro-F1
Downloads last month
17
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for aomar85/A2NLP-StanceNakba2026-SubtaskB-AraBERTv02-Twitter

Finetuned
(37)
this model