jenslemmens commited on
Commit
2981984
·
1 Parent(s): 4611e5d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -7
README.md CHANGED
@@ -18,7 +18,7 @@ widget:
18
  # RePublic
19
 
20
  ### Model description
21
- RePublic (<u>re</u>putation analyzer for <u>public</u> agencies) is a Dutch BERT model based on BERTje (De Vries, 2019). The model was designed to predict the sentiment in Dutch-language news article text about public agencies. RePublic was developed in colaboration with [Jan Boon](https://www.uantwerpen.be/en/staff/jan-boon/).
22
 
23
  ### How to use
24
  The model can be loaded and used to make predictions as follows:
@@ -33,13 +33,10 @@ output = pipe(text)
33
  prediction = output[0]['label'] # 0=”neutral”; 1=”positive”; 2=”negative”
34
  ```
35
 
36
- ### Training and data procedure
37
- RePublic was domain-adapted on 91 661 Flemish news articles from three popular Flemish news providers (“Het Laatste Nieuws”, “Het Nieuwsblad” and “De Morgen”) that mention public agencies. This was done by performing BERT’s language modeling tasks (masked language modeling & next sentence prediction).
38
 
39
- The model was then fine-tuned on a sentiment classification task (“positive”, “negative”, “neutral”). The supervised data consisted of 4 404 annotated sentences mentioning Flemish public agencies of which 1 257 sentences were positive, 1 485 sentences were negative and 1 662 sentences were neutral. Fine-tuning was performed for 4 epochs using a batch size of 8 and a learning rate of 5e-5.
40
-
41
- ### Evaluation
42
- The model was evaluated by performing 10-fold cross validation on the annotated data described above. During cross validation, the optimal number of epochs (4), batch size (8), and learning rate (5e-5) were determined. The standard deviation of the macro-averaged F1-scores of the cross validation experiments amounts to 1.5%. The detailed results of predictions in the cross validation experiments can be found below:
43
 
44
  | **Class** | **Precision (%)** | **Recall (%)** | **F1-score (%)** |
45
  |:---:|:---:|:---:|:---:|
 
18
  # RePublic
19
 
20
  ### Model description
21
+ RePublic (<u>re</u>putation analyzer for <u>public</u> service organizations) is a Dutch BERT model based on BERTje (De Vries, 2019). The model was designed to predict the sentiment in Dutch-language news article text about public agencies. RePublic was developed in colaboration with [Jan Boon](https://www.uantwerpen.be/en/staff/jan-boon/).
22
 
23
  ### How to use
24
  The model can be loaded and used to make predictions as follows:
 
33
  prediction = output[0]['label'] # 0=”neutral”; 1=”positive”; 2=”negative”
34
  ```
35
 
36
+ ### Training data and procedure
37
+ RePublic was domain-adapted on 91 661 Flemish news articles from three popular Flemish news providers between 2000 and 2020 (“Het Laatste Nieuws”, “Het Nieuwsblad” and “De Morgen”). These articles mention at least one out of a pre-defined list of 24 public service organizations, which contains, a.o., De Lijn (public transport organization), VDAB (Flemish job placement service), and Agentschap Zorg en Gezondheid (healthcare service). The domain adaptation was achieved by performing BERT’s language modeling tasks (masked language modeling & next sentence prediction).
38
 
39
+ The model was then fine-tuned on a sentiment classification task (“positive”, “negative”, “neutral”). The supervised data consisted of 4404 annotated sentences mentioning Flemish public agencies of which 1257 sentences were positive, 1485 sentences were negative and 1662 sentences were neutral. Fine-tuning was performed for 4 epochs using a batch size of 8 and a learning rate of 5e-5.
 
 
 
40
 
41
  | **Class** | **Precision (%)** | **Recall (%)** | **F1-score (%)** |
42
  |:---:|:---:|:---:|:---:|