jorisvanzundert commited on
Commit
3fe3997
·
verified ·
1 Parent(s): 6499cea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -1
README.md CHANGED
@@ -9,4 +9,87 @@ tags:
9
  - Robbert
10
  - Angry
11
  - finetune
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - Robbert
10
  - Angry
11
  - finetune
12
+ ---
13
+
14
+ # Model Card for {{ model_id | default("Model ID", true) }}
15
+
16
+ <!-- Provide a quick summary of what the model is/does. -->
17
+
18
+ This model is a finetuning of [pdelobelle/robbert-v2-dutch-base](https://huggingface.co/pdelobelle/robbert-v2-dutch-base) for the classificaion of text as angry or non-angry.
19
+
20
+ ## Model Details
21
+
22
+ ### Model Description
23
+
24
+ This model is a finetuning of [pdelobelle/robbert-v2-dutch-base](https://huggingface.co/pdelobelle/robbert-v2-dutch-base) on a selection of paragraphs mined from the Dutch novel "Ik ga leven" by Lale Gül. (Lale Gül,*Ik ga leven*. 2021. Amsterdam: Prometheus. ISBN 978-9044646870. An English translation of the novel exists: Lale Gül, *I Will Live*. 2023. London: Little, Brown Book Group. ISBN 978-1408716809). The intention of the model is to be able to classify sentences and paragraphs of the book as angry or non-angry. A selection of paragraph was annotated by two individual annotators for angriness (55 paragraphs, Cohen's Kappa of 0.48).
25
+
26
+ <!-- Provide a longer summary of what this model is. -->
27
+
28
+ - **Developed by:** Joris J. van Zundert and Julia Neugarten
29
+ - **Funded by [optional]:** Huygens Institute
30
+ - **Shared by [optional]:** {{ shared_by | default("[More Information Needed]", true)}}
31
+ - **Model type:** text classification
32
+ - **Language(s) (NLP):** Dutch
33
+ - **License:** MIT
34
+ - **Finetuned from model [optional]:** robbert-v2-dutch-base
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+ This model should really **only** be used in the context of research towards the full text of the Dutch version of Lale Güls "Ik ga leven". Any other application is disadvised as the model has only been fine tuned on this specific novel. All results obtained with this model otherwise should be treated witht the greatest care and skeptism.
40
+
41
+ ## Bias, Risks, and Limitations
42
+
43
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
44
+ The model is biased towards the language of Lale Gül in her novel "Ik ga leven". This may include skew towards explicit and aggressive language.
45
+
46
+ ### Recommendations
47
+
48
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
49
+ This model should really **only** be used in the context of research towards the full text of the Dutch version of Lale Güls "Ik ga leven". Any other application is disadvised as the model has only been fine tuned on this specific novel. All results obtained with this model otherwise should be treated witht the greatest care and skeptism.
50
+
51
+ ## How to Get Started with the Model
52
+
53
+ Use the code below to get started with the model.
54
+
55
+ ```
56
+ from transformers import RobertaTokenizer, RobertaForSequenceClassification
57
+ from transformers import TextClassificationPipeline
58
+
59
+ model = RobertaForSequenceClassification.from_pretrained( "./model/angryBERT-v1" )
60
+ tokenizer = RobertaTokenizer.from_pretrained( "./model/angryBERT-v1" )
61
+
62
+ # Just cheking if the model works
63
+ # LABEL_1 means angry
64
+ # LABEL-0 means non-angry
65
+
66
+ input_text = "Ik was kwaad." # en.: "I was angry."
67
+
68
+ pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
69
+ pipe( input_text )
70
+
71
+ # =>
72
+ # [[{'label': 'LABEL_0', 'score': 0.026506226509809494},
73
+ # {'label': 'LABEL_1', 'score': 0.9734938144683838}]]
74
+ ```
75
+
76
+
77
+ ## Training Details
78
+
79
+ ### Training Data
80
+
81
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
82
+ All paragraphs of Lale Gül's novel (Dutch) *Ik ga leven*. Paratext (copyright, title page, etc.) removed, also removed the section of poems at the back of the text.
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+ Trained on 55 paragraphs labeled as either angry (1) or non_angry (0).
88
+
89
+ ## Model Card Authors [optional]
90
+
91
+ Joris J. van Zundert, Julia Neugarten
92
+
93
+ ## Model Card Contact
94
+
95
+ [Joris J. van Zundert](https://huggingface.co/jorisvanzundert)