TruthAR / README.md
hugsanaa's picture
Update README.md
bff3355 verified
---
license: apache-2.0
language:
- ar
base_model:
- aubmindlab/bert-base-arabertv02
---
# TruthAR: Transformer-Based Fake News Detection in Arabic Language
# Overview
TruthAR is a specialized Arabic PLM designed for analyzing news content and detecting the presence of misinformation. It works on modern standard Arabic.
This model can be used for additional fine-tuning and also for testing.
# Model Details:
- **Base Model:** aubmindlab/bert-base-arabertv02
- **Language:** Arabic
- **Dataset used for fine-tuning:** The data used is collected from diverse websites
- **License:** Apache License 2.0
# Model Inference
You can use TruthAR directly on any dataset to detect fake news. To use it, follow the following steps:
**1. Install the required libraries**
Ensure that you have installed the libraries before using the model using pip:
```python
pip install arabert transformers torch
```
**2. Load the Model and Tokenizer**
```python
# Import required Modules
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load model and Tokenizer
model_name = 'hugsanaa/TruthAR'
model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)
```
**3. Predict**
```python
# Example text
text = "الرئيس الأميركي دونالد ترامب صرّح خلال مقابلة صحفية: "إذا نجحت سوريا في التحلي بالسلام فسأرفع العقوبات عنها، وسيحدث ذلك فرقاً"، وذلك ضمن حديثه عن الشرق الأوسط والعقوبات واتفاقيات أبراهام، وذلك بتاريخ 29 حزيران/ يونيو 2025."
# Tokenize input
inputs = tokenizer(text, return_tensor="pt", truncation = True, padding = True)
# Make Predictions
with torch.no_grad():
logits=model(**inputs).logits
predicted_Class = torch.argmax(logits)
# Interpret results
labels = ["Real", "Fake"]
print(f"Prediction: {labels[predicted_class]}")
```
**Inference using pipeline**
```python
import pandas as pd
from transformers import pipeline
import more_itertools
from tqdm import tqdm_notebook as tqdm
model = 'hugsanaa/TruthAR'
# load the dataset (the data must include text column)
data = pd.read_csv(your_fakenews_data)
# generate prediction pipeline
pipe = pipeline("sentiment-analysis", model=model, device=0, return_all_scores =True, max_length=max_len, truncation=True)
preds = []
for s in tqdm(more_itertools.chunked(list(data['text']), 32)): # batching for faster inference
preds.extend(pipe(s))
# Generate final predictions
data[f'preds'] = preds
final_pred = []
for prediction in data['preds']:
final_pred.append(max(prediction, key=lambda x: x['score'])['label'])
data[f'Final Prediction'] = final_pred
```
# Results
Below are the results obtained from testing CyberAraBERT on testing samples from ArCyC data
| Class | Precision | Recall | F1-Score | Support |
|--------------------|-----------|--------|----------|---------|
| Real | 0.9879 | 0.3104 | 0.4724 | 789 |
| Fake | 0.6679 | 0.9973 | 0.8000 | 1093 |
| **Overall / Avg.** | 0.8017 | 0.7100 | 0.6630 | 1879 |