Update README.md
Browse files
README.md
CHANGED
|
@@ -33,6 +33,7 @@ checkpoint = "Sami92/XLM-R-Large-Polarization-Classifier"
|
|
| 33 |
tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512}
|
| 34 |
polarization_classifier = pipeline("text-classification", model = checkpoint, tokenizer =checkpoint, **tokenizer_kwargs, device="cuda")
|
| 35 |
polarization_classifier(texts)
|
|
|
|
| 36 |
|
| 37 |
## Training Details
|
| 38 |
|
|
@@ -40,15 +41,9 @@ polarization_classifier(texts)
|
|
| 40 |
|
| 41 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 42 |
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
### Training Procedure
|
| 46 |
-
|
| 47 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 48 |
-
|
| 49 |
-
#### Preprocessing [optional]
|
| 50 |
|
| 51 |
-
|
| 52 |
|
| 53 |
|
| 54 |
#### Training Hyperparameters
|
|
@@ -71,15 +66,10 @@ Supervised Training on Ashraf et al. 2024
|
|
| 71 |
|
| 72 |
## Evaluation
|
| 73 |
|
| 74 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 75 |
-
|
| 76 |
-
### Testing Data, Factors & Metrics
|
| 77 |
|
| 78 |
#### Testing Data
|
| 79 |
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
[More Information Needed]
|
| 83 |
|
| 84 |
|
| 85 |
### Results
|
|
|
|
| 33 |
tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512}
|
| 34 |
polarization_classifier = pipeline("text-classification", model = checkpoint, tokenizer =checkpoint, **tokenizer_kwargs, device="cuda")
|
| 35 |
polarization_classifier(texts)
|
| 36 |
+
```
|
| 37 |
|
| 38 |
## Training Details
|
| 39 |
|
|
|
|
| 41 |
|
| 42 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 43 |
|
| 44 |
+
The trainingdata for the weakly supervised training was taken from Telegram. More specifically from a set of about 200 channels that have been subject to a fact-check from either Correctiv, dpa, Faktenfuchs or AFP. A sample of 5000 posts was chosen.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
+
In a second step, the model was fine-tuned on the train split from Ashraf et al. 2024.
|
| 47 |
|
| 48 |
|
| 49 |
#### Training Hyperparameters
|
|
|
|
| 66 |
|
| 67 |
## Evaluation
|
| 68 |
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
#### Testing Data
|
| 71 |
|
| 72 |
+
Evaluation was performed on the test split from Ashraf et al. 2024.
|
|
|
|
|
|
|
| 73 |
|
| 74 |
|
| 75 |
### Results
|