hugsanaa
/

TruthAR

+---
+license: apache-2.0
+language:
+- ar
+base_model:
+- aubmindlab/bert-base-arabertv02
+---
+# TruthAR: Transformer-Based Fake News Detection in Arabic Language
+# Overview
+TruthAR is a specialized Arabic PLM designed for analyzing news content and detecting the presence of misinformation. It works on modern standard Arabic.
+This model can be used for additional fine-tuning and also for testing.
+# Model Details:
+- **Base Model:** aubmindlab/bert-base-arabertv02
+- **Language:** Arabic
+- **Dataset used for fine-tuning:** The data used is collected from diverse websites
+- **License:** Apache License 2.0
+# Model Inference
+You can use TruthAR directly on any dataset to detect fake news. To use it, follow the following steps:
+**1. Install the required libraries**
+Ensure that you have installed the libraries before using the model using pip:
+```python
+pip install arabert transformers torch
+```
+**2. Load the Model and Tokenizer**
+```python
+# Import required Modules
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+# Load model and Tokenizer
+model_name = 'hugsanaa/TruthAR'
+model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False, num_labels=2)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+```
+**3. Predict**
+```python
+# Example text
+text = "الرئيس الأميركي دونالد ترامب صرّح خلال مقابلة صحفية: "إذا نجحت سوريا في التحلي بالسلام فسأرفع العقوبات عنها، وسيحدث ذلك فرقاً"، وذلك ضمن حديثه عن الشرق الأوسط والعقوبات واتفاقيات أبراهام، وذلك بتاريخ 29 حزيران/ يونيو 2025."
+# Tokenize input
+inputs = tokenizer(text, return_tensor="pt", truncation = True, padding = True)
+# Make Predictions
+with torch.no_grad():
+  logits=model(**inputs).logits
+  predicted_Class = torch.argmax(logits)
+# Interpret results
+labels = ["Real", "Fake"]
+print(f"Prediction: {labels[predicted_class]}")
+```
+**Inference using pipeline**
+```python
+import pandas as pd
+from transformers import pipeline
+import more_itertools
+from tqdm import tqdm_notebook as tqdm
+model = 'hugsanaa/TruthAR'
+# load the dataset (the data must include text column)
+data = pd.read_csv(your_fakenews_data)
+# generate prediction pipeline
+pipe = pipeline("sentiment-analysis", model=model, device=0, return_all_scores =True, max_length=max_len, truncation=True)
+preds = []
+for s in tqdm(more_itertools.chunked(list(data['text']), 32)): # batching for faster inference
+    preds.extend(pipe(s))
+# Generate final predictions
+data[f'preds'] = preds
+final_pred = []
+for prediction in data['preds']:
+  final_pred.append(max(prediction, key=lambda x: x['score'])['label'])
+data[f'Final Prediction'] = final_pred
+```
+# Results
+Below are the results obtained from testing CyberAraBERT on testing samples from ArCyC data
+| Class              | Precision | Recall | F1-Score | Support |
+|--------------------|-----------|--------|----------|---------|
+| Real               | 0.9879    | 0.3104 | 0.4724   | 789     |
+| Fake               | 0.6679    | 0.9973 | 0.8000   | 1093    |
+| **Overall / Avg.** | 0.8017    | 0.7100 | 0.6630   | 1879    |