--- license: apache-2.0 language: - ar base_model: - aubmindlab/bert-base-arabertv02 --- # TruthAR: Transformer-Based Fake News Detection in Arabic Language # Overview TruthAR is a specialized Arabic PLM designed for analyzing news content and detecting the presence of misinformation. It works on modern standard Arabic. This model can be used for additional fine-tuning and also for testing. # Model Details: - **Base Model:** aubmindlab/bert-base-arabertv02 - **Language:** Arabic - **Dataset used for fine-tuning:** The data used is collected from diverse websites - **License:** Apache License 2.0 # Model Inference You can use TruthAR directly on any dataset to detect fake news. To use it, follow the following steps: **1. Install the required libraries** Ensure that you have installed the libraries before using the model using pip: ```python pip install arabert transformers torch ``` **2. Load the Model and Tokenizer** ```python # Import required Modules from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch # Load model and Tokenizer model_name = 'hugsanaa/TruthAR' model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False, num_labels=2) tokenizer = AutoTokenizer.from_pretrained(model_name) ``` **3. Predict** ```python # Example text text = "الرئيس الأميركي دونالد ترامب صرّح خلال مقابلة صحفية: "إذا نجحت سوريا في التحلي بالسلام فسأرفع العقوبات عنها، وسيحدث ذلك فرقاً"، وذلك ضمن حديثه عن الشرق الأوسط والعقوبات واتفاقيات أبراهام، وذلك بتاريخ 29 حزيران/ يونيو 2025." # Tokenize input inputs = tokenizer(text, return_tensor="pt", truncation = True, padding = True) # Make Predictions with torch.no_grad(): logits=model(**inputs).logits predicted_Class = torch.argmax(logits) # Interpret results labels = ["Real", "Fake"] print(f"Prediction: {labels[predicted_class]}") ``` **Inference using pipeline** ```python import pandas as pd from transformers import pipeline import more_itertools from tqdm import tqdm_notebook as tqdm model = 'hugsanaa/TruthAR' # load the dataset (the data must include text column) data = pd.read_csv(your_fakenews_data) # generate prediction pipeline pipe = pipeline("sentiment-analysis", model=model, device=0, return_all_scores =True, max_length=max_len, truncation=True) preds = [] for s in tqdm(more_itertools.chunked(list(data['text']), 32)): # batching for faster inference preds.extend(pipe(s)) # Generate final predictions data[f'preds'] = preds final_pred = [] for prediction in data['preds']: final_pred.append(max(prediction, key=lambda x: x['score'])['label']) data[f'Final Prediction'] = final_pred ``` # Results Below are the results obtained from testing CyberAraBERT on testing samples from ArCyC data | Class | Precision | Recall | F1-Score | Support | |--------------------|-----------|--------|----------|---------| | Real | 0.9879 | 0.3104 | 0.4724 | 789 | | Fake | 0.6679 | 0.9973 | 0.8000 | 1093 | | **Overall / Avg.** | 0.8017 | 0.7100 | 0.6630 | 1879 |