A2NLP at StanceNakba 2026: AraBERT-Based Arabic Stance Detection (Subtask B)

This repository contains the official submission of Team A2NLP to the StanceNakba 2026 Shared Task – Subtask B (Topic-Based Stance Detection), co-located with LREC-COLING 2026.

Base model: aubmindlab/bert-base-arabertv02-twitter
Best Validation Macro-F1: 0.8434

Team A2NLP

A2NLP is a research team focusing on Arabic Natural Language Processing, with interests in stance detection, political discourse analysis, and transformer-based modeling.

A2NLP-StanceNakba2026-SubtaskB-AraBERTv02-Twitter

This model is a fine-tuned version of aubmindlab/bert-base-arabertv02-twitter on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.4374
Accuracy: 0.8452
Macro F1: 0.8434
Weighted F1: 0.8443
F1 Pro: 0.8870
F1 Against: 0.8346
F1 Neutral: 0.8085

Model Description

This model performs Arabic stance classification with three labels:

pro
against
neutral

The architecture is based on BERT-base and fine-tuned using a prompt-based input formulation that explicitly conditions stance prediction on the topic.

Input Format

During training and inference, each instance is formatted as:

الهدف: {topic} [SEP] الموقف من: {sentence}

This prompt-based concatenation strategy was used to explicitly inject the topic context into the transformer encoder.

The model outputs a probability distribution over the three stance labels using a softmax classification head.

Intended Uses & Limitations

Intended Use

Arabic stance detection in social media text.
Topic-conditioned stance classification.
Research and shared-task benchmarking.

The model is particularly suited for:

Political discourse analysis.
Arabic Twitter stance modeling.
Experimental NLP research on stance detection.

Limitations

The model was trained on Arabic Twitter-style text and may not generalize well to:
- Formal Arabic prose
- Long documents
- Non-political domains
Sensitive to distribution shift.
Performance may degrade on unseen topics.
No external data augmentation was used.

This model should not be used for high-stakes automated decision-making.

Training and Evaluation Data

The model was trained on the official StanceNakba Subtask B train/validation dataset provided by the shared task organizers.

Language: Arabic
Domain: Social media (Twitter-style text)
Labels: pro, against, neutral

No external labeled data were used.

Data Processing

The following preprocessing steps were applied:

Emoji normalization using emoji.demojize
Removal of non-Arabic characters
Removal of URLs, mentions, and hashtags
Diacritics removal
Arabic normalization:
- Alef variants → ا
- ى → ي
- ة → ه
- Removal of tatweel
Whitespace normalization
Duplicate sentence removal
Label encoding:

{"pro": 0, "against": 1, "neutral": 2}

Training procedure

Cross-Validation Strategy

The model was trained using:

5-fold Stratified Cross-Validation
StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

This repository corresponds to Fold 4, which achieved the best validation macro-F1 score.

Loss Function

To address class imbalance, weighted cross-entropy loss was used.

Class weights were computed using:

compute_class_weight(class_weight="balanced")

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Macro F1	Weighted F1	F1 Pro	F1 Against	F1 Neutral
0.9856	1.1628	50	0.7085	0.7321	0.7342	0.7334	0.7963	0.6723	0.7339
0.6344	2.3256	100	0.4542	0.8333	0.8330	0.8334	0.8571	0.8254	0.8163
0.4127	3.4884	150	0.4393	0.8452	0.8434	0.8443	0.8870	0.8346	0.8085
0.2872	4.6512	200	0.4369	0.8274	0.8265	0.8276	0.8727	0.8189	0.7879
0.2028	5.8140	250	0.4617	0.8333	0.8331	0.8332	0.8522	0.8224	0.8246

Framework versions

Transformers 5.0.0
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2

Reproducibility

5-fold Stratified Cross-Validation
Seed: 42
Weighted Cross-Entropy Loss
Early Stopping (patience=2)
Best checkpoint selected by Macro-F1

Downloads last month: 17

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for aomar85/A2NLP-StanceNakba2026-SubtaskB-AraBERTv02-Twitter

Base model

aubmindlab/bert-base-arabertv02-twitter

Finetuned

(37)

this model