CyberAraBERT: AraBERT for Arabic Cyberbullying Detection

Overview

CyberAraBERT is a specialized Arabic PLM designed for analyzing social media content and detecting the presence of cyberbullying. It works on multiple dialects (Egyptian, Gulf, and Levantine).

This model can be used for additional fine-tuning and also for testing.

Model Details:

Base Model: aubmindlab/ber-base-arabertv02-twitter
Language: Arabic
Dataset used for fine-tuning: ArCyC
License: Apache License 2.0

Model Inference

You can use CyberAraBERT directly on any dataset to detect cyberbullying. To use it, follow the following steps:

1. Install the required libraries Ensure that you have installed the libraries before using the model using pip:

pip install arabert transformers torch

2. Load the Model and Tokenizer

# Import required Modules
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load model and Tokenizer
model_name = 'hugsanaa/CyberAraBERT'
model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_name)

3. Predict

# Example of Cyberbullying text
text = "نايس موته يا اهبل"

# Tokenize input
inputs = tokenizer(text, return_tensor="pt", truncation = True, padding = True)

# Make Predictions
with torch.no_grad():
  logits=model(**inputs).logits
  predicted_Class = torch.argmax(logits)

# Interpret results
labels = ["Cyberbullying", "Not Cyberbullying"]
print(f"Prediction: {labels[predicted_class]}")

Inference using pipeline

import pandas as pd
from transformers import pipeline
import more_itertools
from tqdm import tqdm_notebook as tqdm

model = 'hugsanaa/CyberAraBERT'

# load the dataset (the data must include text column)
data = pd.read_csv(your_cyberbulling_data)

# generate prediction pipeline
pipe = pipeline("sentiment-analysis", model=model, device=0, return_all_scores =True, max_length=max_len, truncation=True)
preds = []
for s in tqdm(more_itertools.chunked(list(data['text']), 32)): # batching for faster inference
    preds.extend(pipe(s))

# Generate final predictions
data[f'preds'] = preds
final_pred = []
for prediction in data['preds']:
  final_pred.append(max(prediction, key=lambda x: x['score'])['label'])

data[f'Final Prediction'] = final_pred

Results

Below are the results obtained from testing CyberAraBERT on testing samples from ArCyC data

Class	Precision	Recall	F1-Score	Support
Not Cyberbullying	0.9256	0.9043	0.9148	564
Cyberbullying	0.8453	0.8780	0.8613	336
Overall / Avg.	0.8956	0.8944	0.8948	900

Downloads last month: 1

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hugsanaa/CyberAraBERT

Base model

aubmindlab/bert-base-arabertv02-twitter

Finetuned

(37)

this model