Update README.md
Browse files
README.md
CHANGED
|
@@ -10,7 +10,7 @@ This model is a fine-tuned version of [XLM-R Large](https://huggingface.co/Faceb
|
|
| 10 |
|
| 11 |
|
| 12 |
## Model Details
|
| 13 |
-
|
| 14 |
|
| 15 |
|
| 16 |
## Bias, Risks, and Limitations
|
|
@@ -29,10 +29,7 @@ Use the code below to get started with the model.
|
|
| 29 |
|
| 30 |
### Training Data
|
| 31 |
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
[More Information Needed]
|
| 35 |
-
|
| 36 |
|
| 37 |
#### Training Hyperparameters
|
| 38 |
|
|
@@ -47,8 +44,7 @@ Use the code below to get started with the model.
|
|
| 47 |
|
| 48 |
### Testing Data
|
| 49 |
|
| 50 |
-
|
| 51 |
-
[More Information Needed]
|
| 52 |
|
| 53 |
|
| 54 |
### Results
|
|
|
|
| 10 |
|
| 11 |
|
| 12 |
## Model Details
|
| 13 |
+
Disinformation often follows a set of common narratives, such as "All immigrants are criminals". Fact-checkers from different German organizations provided a list of narratives that they commonly encounter. The trained model is able to recognize if a given text supports one of these narratives.
|
| 14 |
|
| 15 |
|
| 16 |
## Bias, Risks, and Limitations
|
|
|
|
| 29 |
|
| 30 |
### Training Data
|
| 31 |
|
| 32 |
+
The training dataset is a combination of two other datasets. First, a [synthetically generated dataset](https://huggingface.co/datasets/Sami92/german-disinformation-narratives-synthetic) for the given disinformation narratives. Second, a weakly-annotated dataset. For the weak annotation Telegram posts were used. The data was taken from Telegram. More specifically a sample from about 200 channels that have been subject to a fact-check from either Correctiv, dpa, Faktenfuchs or AFP. GPT-4o was prompted to identify which of the narratives is supported by the post if it supports any at all. The exact prompt can be found [here](https://huggingface.co/Sami92/XLM-R-Large-Disinfo-Narrative-Classifier/blob/main/GPT-4o-Prompt.txt).
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
#### Training Hyperparameters
|
| 35 |
|
|
|
|
| 44 |
|
| 45 |
### Testing Data
|
| 46 |
|
| 47 |
+
The test data consists of texts from social media posts that were linked in articles from German fact-checking organizations. In other words, the texts are claims that have been fact-checked. They were selected if they supported one of the disinformation narratives. Due to the aim of selecting naturally occurring claims, the dataset is imbalanced and not all classes are included.
|
|
|
|
| 48 |
|
| 49 |
|
| 50 |
### Results
|