avichr commited on
Commit
382503c
1 Parent(s): 840f64c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -1
README.md CHANGED
@@ -11,6 +11,55 @@ We evaluated the model on emotion recognition and sentiment analysis, for a down
11
  Our User Genrated Content (UGC) is comments written on articles collected from 3 major news sites, between January 2020 to August 2020,. Total data size ~150 MB of data, including over 7 millions words and 350K sentences.
12
  4000 sentences annotated by crowd members (3-10 annotators per sentence) for 8 emotions (anger, disgust, expectation , fear, happy, sadness, surprise and trust) and overall sentiment / polarity<br>
13
  In order to valid the annotation, we search an agreement between raters to emotion in each sentence using krippendorff's alpha [(krippendorff, 1970)](https://journals.sagepub.com/doi/pdf/10.1177/001316447003000105). We left sentences that got alpha > 0.7. Note that while we found a general agreement between raters about emotion like happy, trust and disgust, there are few emotion with general disagreement about them, apparently given the complexity of finding them in the text (e.g. expectation and surprise).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  ## Stay tuned!
16
  We are still working on our model and will edit this page as we progress.<br>
@@ -22,7 +71,7 @@ our git: https://github.com/avichaychriqui/HeBERT
22
  Chriqui, A., & Yahav, I. (2021). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. arXiv preprint arXiv:2102.01909.
23
  ```
24
  @article{chriqui2021hebert,
25
- title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
26
  author={Chriqui, Avihay and Yahav, Inbal},
27
  journal={arXiv preprint arXiv:2102.01909},
28
  year={2021}
 
11
  Our User Genrated Content (UGC) is comments written on articles collected from 3 major news sites, between January 2020 to August 2020,. Total data size ~150 MB of data, including over 7 millions words and 350K sentences.
12
  4000 sentences annotated by crowd members (3-10 annotators per sentence) for 8 emotions (anger, disgust, expectation , fear, happy, sadness, surprise and trust) and overall sentiment / polarity<br>
13
  In order to valid the annotation, we search an agreement between raters to emotion in each sentence using krippendorff's alpha [(krippendorff, 1970)](https://journals.sagepub.com/doi/pdf/10.1177/001316447003000105). We left sentences that got alpha > 0.7. Note that while we found a general agreement between raters about emotion like happy, trust and disgust, there are few emotion with general disagreement about them, apparently given the complexity of finding them in the text (e.g. expectation and surprise).
14
+ ## How to use
15
+ ### For masked-LM model (can be fine-tunned to any down-stream task)
16
+ ```
17
+ from transformers import AutoTokenizer, AutoModel
18
+ tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT")
19
+ model = AutoModel.from_pretrained("avichr/heBERT")
20
+
21
+ from transformers import pipeline
22
+ fill_mask = pipeline(
23
+ "fill-mask",
24
+ model="avichr/heBERT",
25
+ tokenizer="avichr/heBERT"
26
+ )
27
+ fill_mask("讛拽讜专讜谞讛 诇拽讞讛 讗转 [MASK] 讜诇谞讜 诇讗 谞砖讗专 讚讘专.")
28
+ ```
29
+
30
+ ### For sentiment classification model (polarity ONLY):
31
+ ```
32
+ from transformers import AutoTokenizer, AutoModel, pipeline
33
+ tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis") #same as 'avichr/heBERT' tokenizer
34
+ model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")
35
+
36
+ # how to use?
37
+ sentiment_analysis = pipeline(
38
+ "sentiment-analysis",
39
+ model="avichr/heBERT_sentiment_analysis",
40
+ tokenizer="avichr/heBERT_sentiment_analysis",
41
+ return_all_scores = True
42
+ )
43
+
44
+ >>> sentiment_analysis('讗谞讬 诪转诇讘讟 诪讛 诇讗讻讜诇 诇讗专讜讞转 爪讛专讬讬诐')
45
+ [[{'label': 'natural', 'score': 0.9978172183036804},
46
+ {'label': 'positive', 'score': 0.0014792329166084528},
47
+ {'label': 'negative', 'score': 0.0007035882445052266}]]
48
+
49
+ >>> sentiment_analysis('拽驻讛 讝讛 讟注讬诐')
50
+ [[{'label': 'natural', 'score': 0.00047328314394690096},
51
+ {'label': 'possitive', 'score': 0.9994067549705505},
52
+ {'label': 'negetive', 'score': 0.00011996887042187154}]]
53
+
54
+ >>> sentiment_analysis('讗谞讬 诇讗 讗讜讛讘 讗转 讛注讜诇诐')
55
+ [[{'label': 'natural', 'score': 9.214012970915064e-05},
56
+ {'label': 'possitive', 'score': 8.876807987689972e-05},
57
+ {'label': 'negetive', 'score': 0.9998190999031067}]]
58
+ ```
59
+ Our model is also available on AWS! for more information visit [AWS' git](https://github.com/aws-samples/aws-lambda-docker-serverless-inference/tree/main/hebert-sentiment-analysis-inference-docker-lambda)
60
+
61
+
62
+
63
 
64
  ## Stay tuned!
65
  We are still working on our model and will edit this page as we progress.<br>
 
71
  Chriqui, A., & Yahav, I. (2021). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. arXiv preprint arXiv:2102.01909.
72
  ```
73
  @article{chriqui2021hebert,
74
+ title={HeBERT \\& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
75
  author={Chriqui, Avihay and Yahav, Inbal},
76
  journal={arXiv preprint arXiv:2102.01909},
77
  year={2021}