ankekat1000 commited on
Commit
448083e
·
1 Parent(s): 85fbfef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md CHANGED
@@ -1,3 +1,56 @@
1
  ---
2
  license: cc-by-nc-sa-4.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-sa-4.0
3
+ language:
4
+ - de
5
  ---
6
+
7
+ ## Model description
8
+ This model is a fine-tuned version of the [bert-base-german-cased model by deepset](https://huggingface.co/bert-base-german-cased) to classify German-language deliberative comments.
9
+
10
+ ## How to use
11
+
12
+ You can use the model with the following code.
13
+
14
+ ```python
15
+ #!pip install transformers
16
+
17
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline
18
+
19
+ model_path = "ankekat1000/deliberative-bert-german"
20
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
21
+ model = AutoModelForSequenceClassification.from_pretrained(model_path)
22
+
23
+ pipeline = TextClassificationPipeline(model=model, tokenizer=tokenizer)
24
+ print(pipeline('Tolle Idee. Ich denke, dass dieses Projekt Teil des Stadtforums werden sollte, damit wir darüber weiter nachdenken können!'))
25
+ ```
26
+
27
+
28
+ ## Training
29
+
30
+ The pre-trained model [bert-base-german-cased model by deepset](https://huggingface.co/bert-base-german-cased) was fine-tuned on a crowd-annotated data set of 14,000 user comments that has been labeled for deliberation in a binary classification task.
31
+
32
+ As deliberative, we defined comments that are enriching and valuble to a deliberative discussion in whole or in part, such as comments that add arguments, suggestions, or new perspectives to the discussion, or otherwise help users find them stimulating or appreciative.
33
+
34
+ **Language model:** bert-base-cased (~ 12GB)
35
+ **Language:** German
36
+ **Labels:** Engaging (binary classification)
37
+ **Training data:** User comments posted to websites and facebook pages of German news media, user comments posted to online participation platforms (~ 14,000)
38
+ **Labeling procedure:** Crowd annotation
39
+ **Batch size:** 32
40
+ **Epochs:** 4
41
+ **Max. tokens length:** 512
42
+ **Infrastructure**: 1x Quadro RTX 8000
43
+ **Published**: Oct 24th, 2023
44
+
45
+ ## Evaluation results
46
+
47
+ **Accuracy:**: 86%
48
+ **Macro avg. f1:**: 86%
49
+
50
+
51
+
52
+ | Label | Precision | Recall | F1 | Nr. comments in test set |
53
+ | ----------- | ----------- | ----------- | ----------- | ----------- |
54
+ | not deliberative | 0.87 | 0.84 | 0.86 | 701 |
55
+ | deliberative | 0.84 | 0.87 | 0.85 | 667 |
56
+