| | --- |
| | license: apache-2.0 |
| | language: |
| | - ar |
| | base_model: |
| | - aubmindlab/bert-base-arabertv02 |
| | --- |
| | |
| | # TruthAR: Transformer-Based Fake News Detection in Arabic Language |
| |
|
| | # Overview |
| | TruthAR is a specialized Arabic PLM designed for analyzing news content and detecting the presence of misinformation. It works on modern standard Arabic. |
| |
|
| | This model can be used for additional fine-tuning and also for testing. |
| |
|
| | # Model Details: |
| | - **Base Model:** aubmindlab/bert-base-arabertv02 |
| | - **Language:** Arabic |
| | - **Dataset used for fine-tuning:** The data used is collected from diverse websites |
| | - **License:** Apache License 2.0 |
| |
|
| | # Model Inference |
| | You can use TruthAR directly on any dataset to detect fake news. To use it, follow the following steps: |
| |
|
| | **1. Install the required libraries** |
| | Ensure that you have installed the libraries before using the model using pip: |
| | ```python |
| | pip install arabert transformers torch |
| | ``` |
| |
|
| | **2. Load the Model and Tokenizer** |
| | ```python |
| | # Import required Modules |
| | from transformers import AutoModelForSequenceClassification, AutoTokenizer |
| | import torch |
| | |
| | # Load model and Tokenizer |
| | model_name = 'hugsanaa/TruthAR' |
| | model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False, num_labels=2) |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | ``` |
| |
|
| | **3. Predict** |
| | ```python |
| | # Example text |
| | text = "الرئيس الأميركي دونالد ترامب صرّح خلال مقابلة صحفية: "إذا نجحت سوريا في التحلي بالسلام فسأرفع العقوبات عنها، وسيحدث ذلك فرقاً"، وذلك ضمن حديثه عن الشرق الأوسط والعقوبات واتفاقيات أبراهام، وذلك بتاريخ 29 حزيران/ يونيو 2025." |
| | |
| | # Tokenize input |
| | inputs = tokenizer(text, return_tensor="pt", truncation = True, padding = True) |
| | |
| | # Make Predictions |
| | with torch.no_grad(): |
| | logits=model(**inputs).logits |
| | predicted_Class = torch.argmax(logits) |
| | |
| | # Interpret results |
| | labels = ["Real", "Fake"] |
| | print(f"Prediction: {labels[predicted_class]}") |
| | ``` |
| |
|
| | **Inference using pipeline** |
| | ```python |
| | import pandas as pd |
| | from transformers import pipeline |
| | import more_itertools |
| | from tqdm import tqdm_notebook as tqdm |
| | |
| | model = 'hugsanaa/TruthAR' |
| | |
| | # load the dataset (the data must include text column) |
| | data = pd.read_csv(your_fakenews_data) |
| | |
| | # generate prediction pipeline |
| | pipe = pipeline("sentiment-analysis", model=model, device=0, return_all_scores =True, max_length=max_len, truncation=True) |
| | preds = [] |
| | for s in tqdm(more_itertools.chunked(list(data['text']), 32)): # batching for faster inference |
| | preds.extend(pipe(s)) |
| | |
| | # Generate final predictions |
| | data[f'preds'] = preds |
| | final_pred = [] |
| | for prediction in data['preds']: |
| | final_pred.append(max(prediction, key=lambda x: x['score'])['label']) |
| | |
| | data[f'Final Prediction'] = final_pred |
| | ``` |
| |
|
| | # Results |
| | Below are the results obtained from testing CyberAraBERT on testing samples from ArCyC data |
| | | Class | Precision | Recall | F1-Score | Support | |
| | |--------------------|-----------|--------|----------|---------| |
| | | Real | 0.9879 | 0.3104 | 0.4724 | 789 | |
| | | Fake | 0.6679 | 0.9973 | 0.8000 | 1093 | |
| | | **Overall / Avg.** | 0.8017 | 0.7100 | 0.6630 | 1879 | |