Sami92
/

XLM-R-Large-Disinfo-Narrative-Classifier

Text Classification

Model card Files Files and versions

Sami92 commited on Aug 15, 2024

Commit

8e67a70

·

verified ·

1 Parent(s): dbcf9e8

Update README.md

Files changed (1) hide show

README.md +3 -7

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ This model is a fine-tuned version of [XLM-R Large](https://huggingface.co/Faceb
 ## Model Details
 ## Bias, Risks, and Limitations
@@ -29,10 +29,7 @@ Use the code below to get started with the model.
 ### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 #### Training Hyperparameters
@@ -47,8 +44,7 @@ Use the code below to get started with the model.
 ### Testing Data
-[More Information Needed]
 ### Results

 ## Model Details
+Disinformation often follows a set of common narratives, such as "All immigrants are criminals". Fact-checkers from different German organizations provided a list of narratives that they commonly encounter. The trained model is able to recognize if a given text supports one of these narratives.
 ## Bias, Risks, and Limitations
 ### Training Data
+The training dataset is a combination of two other datasets. First, a [synthetically generated dataset](https://huggingface.co/datasets/Sami92/german-disinformation-narratives-synthetic) for the given disinformation narratives. Second, a weakly-annotated dataset. For the weak annotation Telegram posts were used. The data was taken from Telegram. More specifically a sample from about 200 channels that have been subject to a fact-check from either Correctiv, dpa, Faktenfuchs or AFP. GPT-4o was prompted to identify which of the narratives is supported by the post if it supports any at all. The exact prompt can be found [here](https://huggingface.co/Sami92/XLM-R-Large-Disinfo-Narrative-Classifier/blob/main/GPT-4o-Prompt.txt).
 #### Training Hyperparameters
 ### Testing Data
+The test data consists of texts from social media posts that were linked in articles from German fact-checking organizations. In other words, the texts are claims that have been fact-checked. They were selected if they supported one of the disinformation narratives. Due to the aim of selecting naturally occurring claims, the dataset is imbalanced and not all classes are included.
 ### Results