Sami92 commited on
Commit
49e9598
·
verified ·
1 Parent(s): 22b1557

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -142
README.md CHANGED
@@ -8,7 +8,7 @@ pipeline_tag: text-classification
8
 
9
  # Model Card for Model ID
10
 
11
- Fine-tuned (XLM-R Large)[https://huggingface.co/FacebookAI/xlm-roberta-large] for task of classifying sentences as factual or not. The taxonomy for factual claims follows Wilms et al. 2021. The model was first trained on a Telegram dataset that was annotated using GPT-4o with this (prompt)[https://huggingface.co/Sami92/XLM-R-Large-ClaimDetection/blob/main/FactualityPrompt_GPT.txt]. In a second step it was trained on the data from Risch et al. 2021. It was tested on a sample of Telegram posts that were annotated by four trained coders.
12
 
13
 
14
 
@@ -16,47 +16,8 @@ Fine-tuned (XLM-R Large)[https://huggingface.co/FacebookAI/xlm-roberta-large] fo
16
 
17
  ### Model Description
18
 
19
- <!-- Provide a longer summary of what this model is. -->
20
 
21
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
22
-
23
- - **Developed by:** [More Information Needed]
24
- - **Funded by [optional]:** [More Information Needed]
25
- - **Shared by [optional]:** [More Information Needed]
26
- - **Model type:** [More Information Needed]
27
- - **Language(s) (NLP):** [More Information Needed]
28
- - **License:** [More Information Needed]
29
- - **Finetuned from model [optional]:** [More Information Needed]
30
-
31
- ### Model Sources [optional]
32
-
33
- <!-- Provide the basic links for the model. -->
34
-
35
- - **Repository:** [More Information Needed]
36
- - **Paper [optional]:** [More Information Needed]
37
- - **Demo [optional]:** [More Information Needed]
38
-
39
- ## Uses
40
-
41
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
42
-
43
- ### Direct Use
44
-
45
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
46
-
47
- [More Information Needed]
48
-
49
- ### Downstream Use [optional]
50
-
51
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
52
-
53
- [More Information Needed]
54
-
55
- ### Out-of-Scope Use
56
-
57
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
58
-
59
- [More Information Needed]
60
 
61
  ## Bias, Risks, and Limitations
62
 
@@ -72,126 +33,65 @@ Users (both direct and downstream) should be made aware of the risks, biases and
72
 
73
  ## How to Get Started with the Model
74
 
75
- Use the code below to get started with the model.
76
-
77
- [More Information Needed]
78
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  ## Training Details
80
 
81
  ### Training Data
82
 
83
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
84
 
85
- [More Information Needed]
86
-
87
- ### Training Procedure
88
 
89
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
 
 
 
 
 
90
 
91
- #### Preprocessing [optional]
92
 
93
- [More Information Needed]
94
 
95
 
96
  #### Training Hyperparameters
97
 
98
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
99
-
100
- #### Speeds, Sizes, Times [optional]
101
-
102
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
103
-
104
- [More Information Needed]
105
-
106
- ## Evaluation
107
-
108
- <!-- This section describes the evaluation protocols and provides the results. -->
109
-
110
- ### Testing Data, Factors & Metrics
111
-
112
- #### Testing Data
113
-
114
- <!-- This should link to a Dataset Card if possible. -->
115
-
116
- [More Information Needed]
117
-
118
- #### Factors
119
-
120
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
121
-
122
- [More Information Needed]
123
-
124
- #### Metrics
125
-
126
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
127
 
128
- [More Information Needed]
129
-
130
- ### Results
131
-
132
- [More Information Needed]
133
-
134
- #### Summary
135
-
136
-
137
-
138
- ## Model Examination [optional]
139
-
140
- <!-- Relevant interpretability work for the model goes here -->
141
-
142
- [More Information Needed]
143
 
144
- ## Environmental Impact
145
 
146
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
147
 
148
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
149
-
150
- - **Hardware Type:** [More Information Needed]
151
- - **Hours used:** [More Information Needed]
152
- - **Cloud Provider:** [More Information Needed]
153
- - **Compute Region:** [More Information Needed]
154
- - **Carbon Emitted:** [More Information Needed]
155
-
156
- ## Technical Specifications [optional]
157
-
158
- ### Model Architecture and Objective
159
-
160
- [More Information Needed]
161
-
162
- ### Compute Infrastructure
163
-
164
- [More Information Needed]
165
-
166
- #### Hardware
167
-
168
- [More Information Needed]
169
-
170
- #### Software
171
-
172
- [More Information Needed]
173
-
174
- ## Citation [optional]
175
-
176
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
177
 
178
  **BibTeX:**
179
 
180
- [More Information Needed]
181
-
182
- **APA:**
183
-
184
- [More Information Needed]
185
-
186
- ## Glossary [optional]
187
-
188
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
189
-
190
- [More Information Needed]
191
-
192
- ## More Information [optional]
193
-
194
-
195
  @misc{wilms_annotation_2021,
196
  title = {Annotation {Guidelines} for {GermEval} 2021 {Shared} {Task} on the {Identification} of {Toxic}, {Engaging}, and {Fact}-{Claiming} {Comments}. {Excerpt} of an unpublished codebook of the {DEDIS} research group at {Heinrich}-{Heine}-{University} {Düsseldorf} (full version available on request)},
197
  author = {Wilms, L. and Heinbach, D. and Ziegele, M.},
@@ -207,3 +107,6 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
207
  author = {Risch, Julian and Stoll, Anke and Wilms, Lena and Wiegand, Michael},
208
  year = {2021},
209
  }
 
 
 
 
8
 
9
  # Model Card for Model ID
10
 
11
+ Fine-tuned [XLM-R Large](https://huggingface.co/FacebookAI/xlm-roberta-large) for task of classifying sentences as factual or not. The taxonomy for factual claims follows Wilms et al. 2021. The model was first trained on a Telegram dataset that was annotated using GPT-4o with this [prompt](https://huggingface.co/Sami92/XLM-R-Large-ClaimDetection/blob/main/FactualityPrompt_GPT.txt). In a second step it was trained on the data from Risch et al. 2021. It was tested on a sample of Telegram posts that were annotated by four trained coders.
12
 
13
 
14
 
 
16
 
17
  ### Model Description
18
 
19
+ This model is a fine-tuned version of [XLM-R Large](https://huggingface.co/FacebookAI/xlm-roberta-large). It is trained to classify factual claims, a task that is common to automated fact-checking. It was trained in a weakly-supervised fashion. First on a weakly annotated Telegram dataset using GPT-4o and then on the manually annotated dataset from Risch et al. 2021. The datasets are German, however, the underlying model is multilingual. It was not tested how the model performs in other languages. For testing a set of Telegram posts was annotated by four trained coders and the majority label was taken. The model achieves an accuracy score of 0.9 on this dataset. On the test split of Risch et al. 2021, which is drawn from Facebook comments, the model achieves an accuracy of 0.79.
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ## Bias, Risks, and Limitations
23
 
 
33
 
34
  ## How to Get Started with the Model
35
 
36
+ ```python
37
+ from transformers import pipeline
38
+
39
+ texts = [
40
+ 'WTH Riesige giftige Flugspinnen mit 4-Zoll-Beinen auf dem Weg in die Gegend von New York, während sie sich über die Ostküste ausbreiten. Zuerst kamen die gefleckten Laternenfliegen, dann die Zikaden und jetzt die Spinnen. Der Nordosten der USA bereitet sich auf eine Invasion riesiger giftiger Spinnen vor, deren Beine nur einen halben Zoll lang sind und mit dem Fallschirm durch die Luft fliegen können. cbsnews.com/news/joro-spid…',
41
+ 'Es ist Ihnen halt nicht genug was zerstört wurde, Ermittlungen eingestellt und dann kommt die nächste Katastrophe... Wer hier an Zufälle glaubt hat nichts verstanden... <URL>',
42
+ 'IMPFUNG MACHT FREI!!! Schickt das Video an alle eure Kontakte! Abonniert bitte unseren Kanal: <URL> Folgt unserem Chat: <URL> Verbreitet unsere Inhalte und Wissen für den Frieden',
43
+ ]
44
+ checkpoint = "Sami92/XLM-R-Large-ClaimDetection"
45
+ tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512}
46
+ claimdetection = pipeline("text-classification", model = checkpoint, tokenizer =checkpoint, **tokenizer_kwargs, device="cuda")
47
+ claimdetection(texts)
48
+
49
+ >>>
50
+ [{'label': 'factual', 'score': 0.9999344348907471},
51
+ {'label': 'non-factual', 'score': 0.9990422129631042},
52
+ {'label': 'non-factual', 'score': 0.9990965127944946}]
53
+ ```
54
  ## Training Details
55
 
56
  ### Training Data
57
 
58
+ The training proceeded in two steps. First, the model was trained on a weakly annotated dataset and then on the dataset published by Risch et al. 2021. For more information on the second dataset, see the publication.
59
 
60
+ The weak annotation was performed using GPT-4o. The prompt for labeling the data can be found [here](https://huggingface.co/Sami92/XLM-R-Large-ClaimDetection/blob/main/FactualityPrompt_GPT.txt). The data was taken from Telegram. More specifically from a set of about 200 channels that have been subject to a fact-check from either Correctiv, dpa, Faktenfuchs or AFP. The test data consists of 149 Telegram posts. The performance is as follows.
 
 
61
 
62
+ | | precision | recall | f1-score | support |
63
+ |----------------|-----------|--------|----------|---------|
64
+ | **factual** | 0.88 | 0.92 | 0.90 | 71 |
65
+ | **non-factual**| 0.92 | 0.88 | 0.90 | 78 |
66
+ | | | | | |
67
+ | **accuracy** | | | 0.90 | 149 |
68
+ | **macro avg** | 0.90 | 0.90 | 0.90 | 149 |
69
+ | **weighted avg** | 0.90 | 0.90 | 0.90 | 149 |
70
 
 
71
 
 
72
 
73
 
74
  #### Training Hyperparameters
75
 
76
+ Weakly-supervised Training on Telegram Data
77
+ Epochs: 10
78
+ Batch size: 16
79
+ learning_rate: 2e-5
80
+ weight_decay: 0.01
81
+ fp16: True
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
83
+ Supervised Training on Risch et al. 2021 Data
84
+ Epochs: 10
85
+ Batch size: 16
86
+ learning_rate: 2e-5
87
+ weight_decay: 0.01
88
+ fp16: True
 
 
 
 
 
 
 
 
 
89
 
 
90
 
 
91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
 
93
  **BibTeX:**
94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  @misc{wilms_annotation_2021,
96
  title = {Annotation {Guidelines} for {GermEval} 2021 {Shared} {Task} on the {Identification} of {Toxic}, {Engaging}, and {Fact}-{Claiming} {Comments}. {Excerpt} of an unpublished codebook of the {DEDIS} research group at {Heinrich}-{Heine}-{University} {Düsseldorf} (full version available on request)},
97
  author = {Wilms, L. and Heinbach, D. and Ziegele, M.},
 
107
  author = {Risch, Julian and Stoll, Anke and Wilms, Lena and Wiegand, Michael},
108
  year = {2021},
109
  }
110
+
111
+
112
+