|
|
--- |
|
|
license: cc-by-nc-sa-4.0 |
|
|
language: |
|
|
- he |
|
|
metrics: |
|
|
- accuracy |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- code |
|
|
--- |
|
|
|
|
|
## Hebrew Corpus |
|
|
|
|
|
This corpus contains offensive language in Hebrew manually annotated. The data includes 15,881 tweets, labeled with one or more of five classes (abusive, hate, violence, pornographic, or non-offensive). The corpus is annonated manually by Arabic-Hebrew bilingual speakers. |
|
|
|
|
|
https://arxiv.org/abs/2309.02724 |
|
|
|
|
|
## Models |
|
|
|
|
|
AlephBERT (https://huggingface.co/imvladikon/sentence-transformers-alephbert) |
|
|
|
|
|
|
|
|
## Github Repository |
|
|
|
|
|
git clone https://github.com/SinaLab/OffensiveHebrew |
|
|
|
|
|
You can download the data from the following GitGub link: |
|
|
|
|
|
https://github.com/SinaLab/OffensiveHebrew/tree/main/data |