File size: 2,218 Bytes
85fbfef
 
448083e
 
85fbfef
448083e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: cc-by-nc-sa-4.0
language:
- de
---

## Model description
This model is a fine-tuned version of the [bert-base-german-cased model by deepset](https://huggingface.co/bert-base-german-cased) to classify German-language deliberative comments. 

## How to use

You can use the model with the following code.

```python
#!pip install transformers

from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline

model_path = "ankekat1000/deliberative-bert-german"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)

pipeline =  TextClassificationPipeline(model=model, tokenizer=tokenizer)
print(pipeline('Tolle Idee. Ich denke, dass dieses Projekt Teil des Stadtforums werden sollte, damit wir darüber weiter nachdenken können!'))
```


## Training

The pre-trained model [bert-base-german-cased model by deepset](https://huggingface.co/bert-base-german-cased) was fine-tuned on a crowd-annotated data set of 14,000 user comments that has been labeled for deliberation in a binary classification task.

As deliberative, we defined comments that are enriching and valuble to a deliberative discussion in whole or in part, such as comments that add arguments, suggestions, or new perspectives to the discussion, or otherwise help users find them stimulating or appreciative.

**Language model:** bert-base-cased   (~ 12GB)  
**Language:** German  
**Labels:** Engaging (binary classification)  
**Training data:** User comments posted to websites and facebook pages of German news media, user comments posted to online participation platforms (~ 14,000)  
**Labeling procedure:** Crowd annotation  
**Batch size:** 32  
**Epochs:** 4  
**Max. tokens length:** 512  
**Infrastructure**: 1x Quadro RTX 8000  
**Published**: Oct 24th, 2023  

## Evaluation results

**Accuracy:**: 86%  
**Macro avg. f1:**: 86%  



|  Label      | Precision | Recall | F1 | Nr. comments in test set |
| ----------- | ----------- | ----------- | ----------- | ----------- |
| not deliberative | 0.87       | 0.84       | 0.86       | 701       |
| deliberative | 0.84       | 0.87        | 0.85        | 667       |