hugsanaa commited on
Commit
bff3355
·
verified ·
1 Parent(s): 42ff8ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -3
README.md CHANGED
@@ -1,3 +1,94 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - ar
5
+ base_model:
6
+ - aubmindlab/bert-base-arabertv02
7
+ ---
8
+
9
+ # TruthAR: Transformer-Based Fake News Detection in Arabic Language
10
+
11
+ # Overview
12
+ TruthAR is a specialized Arabic PLM designed for analyzing news content and detecting the presence of misinformation. It works on modern standard Arabic.
13
+
14
+ This model can be used for additional fine-tuning and also for testing.
15
+
16
+ # Model Details:
17
+ - **Base Model:** aubmindlab/bert-base-arabertv02
18
+ - **Language:** Arabic
19
+ - **Dataset used for fine-tuning:** The data used is collected from diverse websites
20
+ - **License:** Apache License 2.0
21
+
22
+ # Model Inference
23
+ You can use TruthAR directly on any dataset to detect fake news. To use it, follow the following steps:
24
+
25
+ **1. Install the required libraries**
26
+ Ensure that you have installed the libraries before using the model using pip:
27
+ ```python
28
+ pip install arabert transformers torch
29
+ ```
30
+
31
+ **2. Load the Model and Tokenizer**
32
+ ```python
33
+ # Import required Modules
34
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
35
+ import torch
36
+
37
+ # Load model and Tokenizer
38
+ model_name = 'hugsanaa/TruthAR'
39
+ model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False, num_labels=2)
40
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
41
+ ```
42
+
43
+ **3. Predict**
44
+ ```python
45
+ # Example text
46
+ text = "الرئيس الأميركي دونالد ترامب صرّح خلال مقابلة صحفية: "إذا نجحت سوريا في التحلي بالسلام فسأرفع العقوبات عنها، وسيحدث ذلك فرقاً"، وذلك ضمن حديثه عن الشرق الأوسط والعقوبات واتفاقيات أبراهام، وذلك بتاريخ 29 حزيران/ يونيو 2025."
47
+
48
+ # Tokenize input
49
+ inputs = tokenizer(text, return_tensor="pt", truncation = True, padding = True)
50
+
51
+ # Make Predictions
52
+ with torch.no_grad():
53
+ logits=model(**inputs).logits
54
+ predicted_Class = torch.argmax(logits)
55
+
56
+ # Interpret results
57
+ labels = ["Real", "Fake"]
58
+ print(f"Prediction: {labels[predicted_class]}")
59
+ ```
60
+
61
+ **Inference using pipeline**
62
+ ```python
63
+ import pandas as pd
64
+ from transformers import pipeline
65
+ import more_itertools
66
+ from tqdm import tqdm_notebook as tqdm
67
+
68
+ model = 'hugsanaa/TruthAR'
69
+
70
+ # load the dataset (the data must include text column)
71
+ data = pd.read_csv(your_fakenews_data)
72
+
73
+ # generate prediction pipeline
74
+ pipe = pipeline("sentiment-analysis", model=model, device=0, return_all_scores =True, max_length=max_len, truncation=True)
75
+ preds = []
76
+ for s in tqdm(more_itertools.chunked(list(data['text']), 32)): # batching for faster inference
77
+ preds.extend(pipe(s))
78
+
79
+ # Generate final predictions
80
+ data[f'preds'] = preds
81
+ final_pred = []
82
+ for prediction in data['preds']:
83
+ final_pred.append(max(prediction, key=lambda x: x['score'])['label'])
84
+
85
+ data[f'Final Prediction'] = final_pred
86
+ ```
87
+
88
+ # Results
89
+ Below are the results obtained from testing CyberAraBERT on testing samples from ArCyC data
90
+ | Class | Precision | Recall | F1-Score | Support |
91
+ |--------------------|-----------|--------|----------|---------|
92
+ | Real | 0.9879 | 0.3104 | 0.4724 | 789 |
93
+ | Fake | 0.6679 | 0.9973 | 0.8000 | 1093 |
94
+ | **Overall / Avg.** | 0.8017 | 0.7100 | 0.6630 | 1879 |