marbert-isis-detector
A MARBERT-based binary classifier for Arabic ISIS content, fine-tuned on a corpus of 500,000 Arabic tweets labeled by a taxonomy-guided LLM pipeline. The model identifies pro-ISIS (ISIS) versus non-ISIS (NOT-ISIS) tweets at the post level and is intended to serve as an efficient first-pass filter that a more expensive LLM classifier can then verify.
This checkpoint accompanies the paper "Extremism Detection and Counter-Messaging with Large Language Models" (Alfifi, Kaghazgaran, Caverlee). The code for training and evaluation, the 2,000-tweet evaluation set with LLM predictions, and the prompts used to generate the training labels are released in a companion GitHub repository: majidalfifi/extremism-llm.
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("majidalfifi/marbert-isis-detector")
model = AutoModelForSequenceClassification.from_pretrained("majidalfifi/marbert-isis-detector")
text = "your Arabic tweet here"
inputs = tokenizer(text, return_tensors="pt", max_length=128, truncation=True, padding="max_length")
with torch.no_grad():
logits = model(**inputs).logits
label = model.config.id2label[int(logits.argmax(dim=-1))]
print(label) # -> "ISIS" or "NOT-ISIS"
For batch inference over a CSV or line-delimited file, see train_marbert.py in the companion repository, which supports an --eval-only --checkpoint majidalfifi/marbert-isis-detector mode.
Label mapping
| id | label |
|---|---|
| 0 | ISIS |
| 1 | NOT-ISIS |
Training data
The model was fine-tuned on a balanced 500,000-tweet corpus (250,000 pro-ISIS + 250,000 NOT-ISIS) constructed by the LLM-labeling pipeline described in the paper:
- An LLM iteratively induces a taxonomy of extremist content from 20,000 seed pro-ISIS + 20,000 seed NOT-ISIS tweets.
- The taxonomy is then used to classify 1,000,000 random Arabic tweets and to refine labels for a larger pool of pro-ISIS-account tweets; the 500,000 final labels feed this model.
The underlying corpus was drawn from a 2015 Arabic Twitter archive and is not redistributed with this model (Twitter/X Terms of Service).
Training procedure
The classifier was fine-tuned from UBC-NLP/MARBERT using a standard BERT-for-sequence-classification head. Hyperparameters match those reported in the paper:
| Setting | Value |
|---|---|
| Base model | UBC-NLP/MARBERT |
| Max sequence length | 128 |
| Batch size | 64 |
| Epochs | 5 |
| Learning rate | 2e-6 |
| Optimizer | AdamW |
| LR schedule | Linear warmup (10%) โ linear decay |
| Gradient clipping | 1.0 (max norm) |
| Hardware | 4ร NVIDIA RTX A6000 (DataParallel) |
| Random seed | 42 |
Evaluation
On a 10% held-out test split (~50,000 tweets), the paper reports:
| Metric | ISIS class | NOT-ISIS class | Overall |
|---|---|---|---|
| Precision | 0.88 | 0.93 | โ |
| Recall | 0.94 | 0.87 | โ |
| F1 | 0.91 | 0.90 | โ |
| Accuracy | โ | โ | 0.90 |
See Table 3 of the paper for scaling results at 1K, 10K, 100K, 250K, and 500K training sizes.
Intended use and limitations
Intended use. This model is intended as an automated first-pass filter for detecting pro-ISIS Arabic social media content in research settings โ for example, as a cost-effective precursor to a more expensive taxonomy-guided LLM classifier that verifies flagged posts, or as a baseline for Arabic extremism-detection research. The label ISIS should be read as "appears to endorse, recruit for, or glorify ISIS-affiliated groups" rather than as "mentions ISIS"; the training corpus includes many NOT-ISIS tweets that reference ISIS in neutral or opposing terms.
Out-of-scope use. The model has not been validated for:
- Non-Arabic languages (including Arabic-script text in other languages).
- Extremism from other ideological movements (e.g., far-right, other jihadist groups, white-supremacist content). It is trained specifically on ISIS-era material.
- Automated enforcement, account suspension, or any high-stakes moderation decision without human review. False positives on this task have real consequences for individuals.
Limitations.
- Temporal drift. The underlying Twitter archive is from 2015, when ISIS messaging took specific linguistic forms. Current extremist rhetoric โ ISIS-inspired or otherwise โ may differ. Performance on recent content is unlikely to match the reported numbers.
- Dialectal coverage. Although MARBERT was pretrained on dialectal Arabic, the training labels were generated by an LLM and may underrepresent some dialects and script variants.
- Label noise. Training labels come from an LLM (GPT-4o with a taxonomy-guided prompt), not human adjudication. While the paper validates the taxonomy against human judgment on a 2,000-tweet evaluation set, individual training labels may be noisy.
- Content-type mismatch. The model was trained on short tweets. Longer documents or multimodal content will be truncated at 128 tokens and performance is undefined.
Ethical considerations. Extremism classification is sensitive. Users should consult the accompanying paper's Limitations and Ethical Considerations sections before deploying this model in any setting beyond research. The model is released under CC-BY-4.0 to encourage responsible use, but redistribution of model outputs on individuals should comply with relevant laws and platform policies.
- Downloads last month
- 30
Model tree for alfifi/marbert-isis-detector
Base model
UBC-NLP/MARBERTEvaluation results
- accuracyself-reported0.900
- ISIS F1self-reported0.910
- ISIS precisionself-reported0.880
- ISIS recallself-reported0.940