CyberAraBERT: AraBERT for Arabic Cyberbullying Detection
Overview
CyberAraBERT is a specialized Arabic PLM designed for analyzing social media content and detecting the presence of cyberbullying. It works on multiple dialects (Egyptian, Gulf, and Levantine).
This model can be used for additional fine-tuning and also for testing.
Model Details:
- Base Model: aubmindlab/ber-base-arabertv02-twitter
- Language: Arabic
- Dataset used for fine-tuning: ArCyC
- License: Apache License 2.0
Model Inference
You can use CyberAraBERT directly on any dataset to detect cyberbullying. To use it, follow the following steps:
1. Install the required libraries Ensure that you have installed the libraries before using the model using pip:
pip install arabert transformers torch
2. Load the Model and Tokenizer
# Import required Modules
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load model and Tokenizer
model_name = 'hugsanaa/CyberAraBERT'
model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)
3. Predict
# Example of Cyberbullying text
text = "ูุงูุณ ู
ูุชู ูุง ุงูุจู"
# Tokenize input
inputs = tokenizer(text, return_tensor="pt", truncation = True, padding = True)
# Make Predictions
with torch.no_grad():
logits=model(**inputs).logits
predicted_Class = torch.argmax(logits)
# Interpret results
labels = ["Cyberbullying", "Not Cyberbullying"]
print(f"Prediction: {labels[predicted_class]}")
Inference using pipeline
import pandas as pd
from transformers import pipeline
import more_itertools
from tqdm import tqdm_notebook as tqdm
model = 'hugsanaa/CyberAraBERT'
# load the dataset (the data must include text column)
data = pd.read_csv(your_cyberbulling_data)
# generate prediction pipeline
pipe = pipeline("sentiment-analysis", model=model, device=0, return_all_scores =True, max_length=max_len, truncation=True)
preds = []
for s in tqdm(more_itertools.chunked(list(data['text']), 32)): # batching for faster inference
preds.extend(pipe(s))
# Generate final predictions
data[f'preds'] = preds
final_pred = []
for prediction in data['preds']:
final_pred.append(max(prediction, key=lambda x: x['score'])['label'])
data[f'Final Prediction'] = final_pred
Results
Below are the results obtained from testing CyberAraBERT on testing samples from ArCyC data
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Not Cyberbullying | 0.9256 | 0.9043 | 0.9148 | 564 |
| Cyberbullying | 0.8453 | 0.8780 | 0.8613 | 336 |
| Overall / Avg. | 0.8956 | 0.8944 | 0.8948 | 900 |
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for hugsanaa/CyberAraBERT
Base model
aubmindlab/bert-base-arabertv02-twitter