hugsanaa
/

TruthAR

Model card Files Files and versions

TruthAR / README.md

hugsanaa's picture

Update README.md

bff3355 verified 6 months ago

|

history blame contribute delete

3.26 kB

	---
	license: apache-2.0
	language:
	- ar
	base_model:
	- aubmindlab/bert-base-arabertv02
	---

	# TruthAR: Transformer-Based Fake News Detection in Arabic Language

	# Overview
	TruthAR is a specialized Arabic PLM designed for analyzing news content and detecting the presence of misinformation. It works on modern standard Arabic.

	This model can be used for additional fine-tuning and also for testing.

	# Model Details:
	- Base Model: aubmindlab/bert-base-arabertv02
	- Language: Arabic
	- Dataset used for fine-tuning: The data used is collected from diverse websites
	- License: Apache License 2.0

	# Model Inference
	You can use TruthAR directly on any dataset to detect fake news. To use it, follow the following steps:

	1. Install the required libraries
	Ensure that you have installed the libraries before using the model using pip:
	```python
	pip install arabert transformers torch
	```

	2. Load the Model and Tokenizer
	```python
	# Import required Modules
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	# Load model and Tokenizer
	model_name = 'hugsanaa/TruthAR'
	model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False, num_labels=2)
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	```

	3. Predict
	```python
	# Example text
	text = "الرئيس الأميركي دونالد ترامب صرّح خلال مقابلة صحفية: "إذا نجحت سوريا في التحلي بالسلام فسأرفع العقوبات عنها، وسيحدث ذلك فرقاً"، وذلك ضمن حديثه عن الشرق الأوسط والعقوبات واتفاقيات أبراهام، وذلك بتاريخ 29 حزيران/ يونيو 2025."

	# Tokenize input
	inputs = tokenizer(text, return_tensor="pt", truncation = True, padding = True)

	# Make Predictions
	with torch.no_grad():
	logits=model(**inputs).logits
	predicted_Class = torch.argmax(logits)

	# Interpret results
	labels = ["Real", "Fake"]
	print(f"Prediction: {labels[predicted_class]}")
	```

	Inference using pipeline
	```python
	import pandas as pd
	from transformers import pipeline
	import more_itertools
	from tqdm import tqdm_notebook as tqdm

	model = 'hugsanaa/TruthAR'

	# load the dataset (the data must include text column)
	data = pd.read_csv(your_fakenews_data)

	# generate prediction pipeline
	pipe = pipeline("sentiment-analysis", model=model, device=0, return_all_scores =True, max_length=max_len, truncation=True)
	preds = []
	for s in tqdm(more_itertools.chunked(list(data['text']), 32)): # batching for faster inference
	preds.extend(pipe(s))

	# Generate final predictions
	data[f'preds'] = preds
	final_pred = []
	for prediction in data['preds']:
	final_pred.append(max(prediction, key=lambda x: x['score'])['label'])

	data[f'Final Prediction'] = final_pred
	```

	# Results
	Below are the results obtained from testing CyberAraBERT on testing samples from ArCyC data
	\| Class \| Precision \| Recall \| F1-Score \| Support \|
	\|--------------------\|-----------\|--------\|----------\|---------\|
	\| Real \| 0.9879 \| 0.3104 \| 0.4724 \| 789 \|
	\| Fake \| 0.6679 \| 0.9973 \| 0.8000 \| 1093 \|
	\| Overall / Avg. \| 0.8017 \| 0.7100 \| 0.6630 \| 1879 \|