TruthAR: Transformer-Based Fake News Detection in Arabic Language

Overview

TruthAR is a specialized Arabic PLM designed for analyzing news content and detecting the presence of misinformation. It works on modern standard Arabic.

This model can be used for additional fine-tuning and also for testing.

Model Details:

Base Model: aubmindlab/bert-base-arabertv02
Language: Arabic
Dataset used for fine-tuning: The data used is collected from diverse websites
License: Apache License 2.0

Model Inference

You can use TruthAR directly on any dataset to detect fake news. To use it, follow the following steps:

1. Install the required libraries Ensure that you have installed the libraries before using the model using pip:

pip install arabert transformers torch

2. Load the Model and Tokenizer

# Import required Modules
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load model and Tokenizer
model_name = 'hugsanaa/TruthAR'
model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)

3. Predict

# Example text
text = "الرئيس الأميركي دونالد ترامب صرّح خلال مقابلة صحفية: "إذا نجحت سوريا في التحلي بالسلام فسأرفع العقوبات عنها، وسيحدث ذلك فرقاً"، وذلك ضمن حديثه عن الشرق الأوسط والعقوبات واتفاقيات أبراهام، وذلك بتاريخ 29 حزيران/ يونيو 2025."

# Tokenize input
inputs = tokenizer(text, return_tensor="pt", truncation = True, padding = True)

# Make Predictions
with torch.no_grad():
  logits=model(**inputs).logits
  predicted_Class = torch.argmax(logits)

# Interpret results
labels = ["Real", "Fake"]
print(f"Prediction: {labels[predicted_class]}")

Inference using pipeline

import pandas as pd
from transformers import pipeline
import more_itertools
from tqdm import tqdm_notebook as tqdm

model = 'hugsanaa/TruthAR'

# load the dataset (the data must include text column)
data = pd.read_csv(your_fakenews_data)

# generate prediction pipeline
pipe = pipeline("sentiment-analysis", model=model, device=0, return_all_scores =True, max_length=max_len, truncation=True)
preds = []
for s in tqdm(more_itertools.chunked(list(data['text']), 32)): # batching for faster inference
    preds.extend(pipe(s))

# Generate final predictions
data[f'preds'] = preds
final_pred = []
for prediction in data['preds']:
  final_pred.append(max(prediction, key=lambda x: x['score'])['label'])

data[f'Final Prediction'] = final_pred

Results

Below are the results obtained from testing CyberAraBERT on testing samples from ArCyC data

Class	Precision	Recall	F1-Score	Support
Real	0.9879	0.3104	0.4724	789
Fake	0.6679	0.9973	0.8000	1093
Overall / Avg.	0.8017	0.7100	0.6630	1879

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hugsanaa/TruthAR

Base model

aubmindlab/bert-base-arabertv02

Finetuned

(4019)

this model