ankekat1000
/

deliberative-bert-german

Text Classification

Model card Files Files and versions

ankekat1000 commited on Nov 9, 2023

Commit

448083e

·

1 Parent(s): 85fbfef

Update README.md

Files changed (1) hide show

README.md +53 -0

README.md CHANGED Viewed

@@ -1,3 +1,56 @@
 ---
 license: cc-by-nc-sa-4.0
 ---

 ---
 license: cc-by-nc-sa-4.0
+language:
+- de
 ---
+## Model description
+This model is a fine-tuned version of the [bert-base-german-cased model by deepset](https://huggingface.co/bert-base-german-cased) to classify German-language deliberative comments.
+## How to use
+You can use the model with the following code.
+```python
+#!pip install transformers
+from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline
+model_path = "ankekat1000/deliberative-bert-german"
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+model = AutoModelForSequenceClassification.from_pretrained(model_path)
+pipeline =  TextClassificationPipeline(model=model, tokenizer=tokenizer)
+print(pipeline('Tolle Idee. Ich denke, dass dieses Projekt Teil des Stadtforums werden sollte, damit wir darüber weiter nachdenken können!'))
+```
+## Training
+The pre-trained model [bert-base-german-cased model by deepset](https://huggingface.co/bert-base-german-cased) was fine-tuned on a crowd-annotated data set of 14,000 user comments that has been labeled for deliberation in a binary classification task.
+As deliberative, we defined comments that are enriching and valuble to a deliberative discussion in whole or in part, such as comments that add arguments, suggestions, or new perspectives to the discussion, or otherwise help users find them stimulating or appreciative.
+**Language model:** bert-base-cased   (~ 12GB)
+**Language:** German
+**Labels:** Engaging (binary classification)
+**Training data:** User comments posted to websites and facebook pages of German news media, user comments posted to online participation platforms (~ 14,000)
+**Labeling procedure:** Crowd annotation
+**Batch size:** 32
+**Epochs:** 4
+**Max. tokens length:** 512
+**Infrastructure**: 1x Quadro RTX 8000
+**Published**: Oct 24th, 2023
+## Evaluation results
+**Accuracy:**: 86%
+**Macro avg. f1:**: 86%
+|  Label      | Precision | Recall | F1 | Nr. comments in test set |
+| ----------- | ----------- | ----------- | ----------- | ----------- |
+| not deliberative | 0.87       | 0.84       | 0.86       | 701       |
+| deliberative | 0.84       | 0.87        | 0.85        | 667       |