Offensive Hebrew
Collection
17 items • Updated • 1
# Load model directly
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("SinaLab/Offensive-Hebrew")
model = AutoModel.from_pretrained("SinaLab/Offensive-Hebrew")This corpus contains offensive language in Hebrew manually annotated. The data includes 15,881 tweets, labeled with one or more of five classes (abusive, hate, violence, pornographic, or non-offensive). The corpus is annonated manually by Arabic-Hebrew bilingual speakers.
https://arxiv.org/abs/2309.02724
AlephBERT (https://huggingface.co/imvladikon/sentence-transformers-alephbert)
git clone https://github.com/SinaLab/OffensiveHebrew
You can download the data from the following GitGub link:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="SinaLab/Offensive-Hebrew")