File size: 4,518 Bytes

8179fc0
 
 
 
 
 
 
 
 
 
 
 
 
 
e9b1bfd
 
c7ae47a
 
 
e9b1bfd
 
 
 
 
 
 
 
4f5af5d
e9b1bfd
dbe7e50
e9b1bfd
 
4f5af5d
e9b1bfd
 
 
4f5af5d
e9b1bfd
 
 
 
 
6066522
 
e9b1bfd
 
 
 
 
4f5af5d
e9b1bfd
597d8e5
e9b1bfd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9e2e15f
4f5af5d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56f7096
4f5af5d
9e2e15f
 
 
 
 
 
 
 
 
 
 
 
 
b9265ac

---
language:
- de
metrics:
- f1
- precision
- recall
pipeline_tag: text-classification
tags:
- fearspeech
- classification
- social science
- communication
- hatespeech
---
This model card provides details about the DistilBERT-based classifier designed to detect fear speech (FS) in German language Telegram posts. The classifier was developed to study the prevalence and dynamics of FS in the communication of radical and extremist actors on Telegram.

**Model Details**  
**Model Description**

The distilbert_fearspeech_classifier is a fine-tuned DistilBERT model aimed at identifying and classifying fear speech in social media posts, particularly those from radical and extremist groups. It has been trained on a dataset of annotated Telegram posts from far-right, COVID-19 protest, and conspiracy-focused actors.

    Developed by: Simon Greipl, Julian Hohner, Heidi Schulze, Patrick Schwabl, Diana Rieger
    Model type: DistilBERT for text classification
    Language(s) (NLP): German
    License: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License

**Model Sources**

Paper: ["You are doomed!" Crisis-specific and Dynamic Use of Fear Speech in Protest and Extremist Radical Social Movements](https://doi.org/10.51685/jqd.2024.icwsm.8)


**Direct Uses**
  
  The model is used directly to classify Telegram posts into fear speech (FS) and non-fear speech (no FS) categories. This is particularly useful for researchers studying online radicalization and the dynamics of fear speech in social media.

**Downstream Use**
  
  The model can be fine-tuned for specific tasks related to hate speech detection, communication studies, and social media analysis.
  Out-of-Scope Use
  
  This model should not be used for labeling and penalizing individuals or groups without context or understanding of the nuances in their communication. Misuse could lead to unjust outcomes.

**Bias, Risks, and Limitations**
  
  The model was trained on data from specific Telegram channels and groups known for their extremist content. Inherent biases in the data may affect the model's predictions. Misuse of the model could lead to labeling and penalizing individuals or groups without proper context.
  Users should be aware of the risks, biases, and limitations of the model. Further research and contextual understanding are recommended before using the model for critical decision-making.
  How to Get Started with the Model

**Use the following code to get started with the model:**

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("PatrickSchwabl/distilbert_fearspeech_classifier")
model = AutoModelForSequenceClassification.from_pretrained("PatrickSchwabl/distilbert_fearspeech_classifier")

# Tokenize input text
inputs = tokenizer("Your text here", return_tensors="pt")

# Get model predictions
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Print predictions
print(predictions)

```

**Training Details**  
**Training Data**

The model was trained on a dataset of manually annotated Telegram posts from radical and extremist groups. The dataset includes posts related to six crisis-specific topics: COVID-19, Conspiracy Narratives, Russian Invasion of Ukraine (RioU), Energy Crisis, Inflation, and Migration.
Training Procedure
Preprocessing

    Data cleaning involved removing emojis, numbers, and hyperlinks.
    Posts shorter than ten characters and longer than 1000 characters were excluded.

The evaluation metrics include precision, recall, and F1-score, which are essential for understanding the model's performance in classifying fear speech.
Results

    Validation Set Precision: 0.82
    Validation Set Recall: 0.82
    Validation Set F1-Score: 0.82
    Test Set Precision: 0.79
    Test Set Recall: 0.79
    Test Set F1-Score: 0.79

**Summary**

The model demonstrated robust performance with balanced precision and recall metrics above 0.76.


**BibTex**  
```
@article{Greipl2024,
  title={"You are doomed!" Crisis-specific and Dynamic Use of Fear Speech in Protest and Extremist Radical Social Movements},
  author={Simon Greipl, Julian Hohner, Heidi Schulze, Patrick Schwabl, Diana Rieger},
  journal={Journal of Quantitative Description: Digital Media},
  volume={4},
  year={2024},
  doi={10.51685/jqd.2024.icwsm.8}
}
```


**Model Card Contact**

Simon Greipl - simon.greipl@ifkw.lmu.de