Sami92
/

XLM-R-Large-Polarization-Classifier

Text Classification

text-embeddings-inference

Model card Files Files and versions

Sami92 commited on Aug 15, 2024

Commit

cf5a1f8

·

verified ·

1 Parent(s): 7c9e74d

Update README.md

Files changed (1) hide show

README.md +4 -14

README.md CHANGED Viewed

@@ -33,6 +33,7 @@ checkpoint = "Sami92/XLM-R-Large-Polarization-Classifier"
 tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512}
 polarization_classifier = pipeline("text-classification", model = checkpoint, tokenizer =checkpoint, **tokenizer_kwargs, device="cuda")
 polarization_classifier(texts)
 ## Training Details
@@ -40,15 +41,9 @@ polarization_classifier(texts)
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
 #### Training Hyperparameters
@@ -71,15 +66,10 @@ Supervised Training on Ashraf et al. 2024
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
 #### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
 ### Results

 tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512}
 polarization_classifier = pipeline("text-classification", model = checkpoint, tokenizer =checkpoint, **tokenizer_kwargs, device="cuda")
 polarization_classifier(texts)
+```
 ## Training Details
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+The trainingdata for the weakly supervised training was taken from Telegram. More specifically from a set of about 200 channels that have been subject to a fact-check from either Correctiv, dpa, Faktenfuchs or AFP. A sample of 5000 posts was chosen.
+In a second step, the model was fine-tuned on the train split from Ashraf et al. 2024.
 #### Training Hyperparameters
 ## Evaluation
 #### Testing Data
+Evaluation was performed on the test split from Ashraf et al. 2024.
 ### Results