Sami92 commited on
Commit
8e67a70
·
verified ·
1 Parent(s): dbcf9e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -7
README.md CHANGED
@@ -10,7 +10,7 @@ This model is a fine-tuned version of [XLM-R Large](https://huggingface.co/Faceb
10
 
11
 
12
  ## Model Details
13
-
14
 
15
 
16
  ## Bias, Risks, and Limitations
@@ -29,10 +29,7 @@ Use the code below to get started with the model.
29
 
30
  ### Training Data
31
 
32
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
33
-
34
- [More Information Needed]
35
-
36
 
37
  #### Training Hyperparameters
38
 
@@ -47,8 +44,7 @@ Use the code below to get started with the model.
47
 
48
  ### Testing Data
49
 
50
-
51
- [More Information Needed]
52
 
53
 
54
  ### Results
 
10
 
11
 
12
  ## Model Details
13
+ Disinformation often follows a set of common narratives, such as "All immigrants are criminals". Fact-checkers from different German organizations provided a list of narratives that they commonly encounter. The trained model is able to recognize if a given text supports one of these narratives.
14
 
15
 
16
  ## Bias, Risks, and Limitations
 
29
 
30
  ### Training Data
31
 
32
+ The training dataset is a combination of two other datasets. First, a [synthetically generated dataset](https://huggingface.co/datasets/Sami92/german-disinformation-narratives-synthetic) for the given disinformation narratives. Second, a weakly-annotated dataset. For the weak annotation Telegram posts were used. The data was taken from Telegram. More specifically a sample from about 200 channels that have been subject to a fact-check from either Correctiv, dpa, Faktenfuchs or AFP. GPT-4o was prompted to identify which of the narratives is supported by the post if it supports any at all. The exact prompt can be found [here](https://huggingface.co/Sami92/XLM-R-Large-Disinfo-Narrative-Classifier/blob/main/GPT-4o-Prompt.txt).
 
 
 
33
 
34
  #### Training Hyperparameters
35
 
 
44
 
45
  ### Testing Data
46
 
47
+ The test data consists of texts from social media posts that were linked in articles from German fact-checking organizations. In other words, the texts are claims that have been fact-checked. They were selected if they supported one of the disinformation narratives. Due to the aim of selecting naturally occurring claims, the dataset is imbalanced and not all classes are included.
 
48
 
49
 
50
  ### Results