Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# HeBERT: Pre-trained BERT for Polarity Analysis and Emotion Recognition
|
| 2 |
+
<img align="right" src="https://github.com/avichaychriqui/HeBERT/blob/main/data/heBERT_logo.png?raw=true" width="250">
|
| 3 |
+
|
| 4 |
+
HeBERT is a Hebrew pretrained language model. It is based on [Google's BERT](https://arxiv.org/abs/1810.04805) architecture and it is BERT-Base config. <br>
|
| 5 |
+
|
| 6 |
+
HeBert was trained on three dataset:
|
| 7 |
+
1. A Hebrew version of [OSCAR](https://oscar-corpus.com/): ~9.8 GB of data, including 1 billion words and over 20.8 millions sentences.
|
| 8 |
+
2. A Hebrew dump of [Wikipedia](https://dumps.wikimedia.org/): ~650 MB of data, including over 63 millions words and 3.8 millions sentences
|
| 9 |
+
3. Emotion User Generated Content (UGC) data that was collected for the purpose of this study (described below).
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
## Named-entity recognition (NER)
|
| 13 |
+
The ability of the model to classify named entities in text, such as persons' names, organizations, and locations; tested on a labeled dataset from [Ben Mordecai and M Elhadad (2005)](https://www.cs.bgu.ac.il/~elhadad/nlpproj/naama/), and evaluated with F1-score.
|
| 14 |
+
|
| 15 |
+
### How to use
|
| 16 |
+
```
|
| 17 |
+
from transformers import pipeline
|
| 18 |
+
|
| 19 |
+
# how to use?
|
| 20 |
+
NER = pipeline(
|
| 21 |
+
"token-classification",
|
| 22 |
+
model="avichr/heBERT_NER",
|
| 23 |
+
tokenizer="avichr/heBERT_NER",
|
| 24 |
+
)
|
| 25 |
+
NER('讚讜讬讚 诇讜诪讚 讘讗讜谞讬讘专住讬讟讛 讛注讘专讬转 砖讘讬专讜砖诇讬诐')
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
## Other tasks
|
| 29 |
+
[**Emotion Recognition Model**](https://huggingface.co/avichr/hebEMO_trust).
|
| 30 |
+
An online model can be found at [huggingface spaces](https://huggingface.co/spaces/avichr/HebEMO_demo) or as [colab notebook](https://colab.research.google.com/drive/1Jw3gOWjwVMcZslu-ttXoNeD17lms1-ff?usp=sharing)
|
| 31 |
+
<br>
|
| 32 |
+
[**Sentiment Analysis**](https://huggingface.co/avichr/heBERT_sentiment_analysis).
|
| 33 |
+
<br>
|
| 34 |
+
[**masked-LM model**](https://huggingface.co/avichr/heBERT) (can be fine-tunned to any down-stream task).
|
| 35 |
+
|
| 36 |
+
## Contact us
|
| 37 |
+
[Avichay Chriqui](mailto:avichayc@mail.tau.ac.il) <br>
|
| 38 |
+
[Inbal yahav](mailto:inbalyahav@tauex.tau.ac.il) <br>
|
| 39 |
+
The Coller Semitic Languages AI Lab <br>
|
| 40 |
+
Thank you, 转讜讚讛, 卮賰乇丕 <br>
|
| 41 |
+
|
| 42 |
+
## If you used this model please cite us as :
|
| 43 |
+
Chriqui, A., & Yahav, I. (2021). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. arXiv preprint arXiv:2102.01909.
|
| 44 |
+
```
|
| 45 |
+
@article{chriqui2021hebert,
|
| 46 |
+
title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
|
| 47 |
+
author={Chriqui, Avihay and Yahav, Inbal},
|
| 48 |
+
journal={arXiv preprint arXiv:2102.01909},
|
| 49 |
+
year={2021}
|
| 50 |
+
}
|
| 51 |
+
```
|
| 52 |
+
[git](https://github.com/avichaychriqui/HeBERT)
|
| 53 |
+
|