Sami92
/

XLM-R-Large-PartyPress

Text Classification

Comparative Agendas

Topic Classification

Model card Files Files and versions

Sami92 commited on Aug 1, 2024

Commit

8c2b2e3

·

verified ·

1 Parent(s): 9ece9b7

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -52,7 +52,7 @@ This model is based on FacebookAI/xlm-roberta-large and was trained in a two-ste
 ### Training Data
-The model was trained on two datasets, each based on the data from partypress/partypress-multilingual. The first dataset was weakly labeled using GPT-4o. The prompt contained the label description taken from [Erfort et al. (2023)](https://journals.sagepub.com/doi/10.1177/20531680231183512). The weakly labeled dataset contains 32,060 press releases.
 The second dataset is the human-annotated dataset that is used for training partypress/partypress-multilingual. For training only the single-coded examples were used (24,117). Evaluation was performed on the data that is annotated by two human coders per example (3,121).

 ### Training Data
+The model was trained on two datasets, each based on the data from partypress/partypress-multilingual. The first dataset was weakly labeled using GPT-4o. The [prompt](https://huggingface.co/Sami92/XLM-R-Large-PartyPress/blob/main/FinalPromptPartyPress.txt) contained the label description taken from [Erfort et al. (2023)](https://journals.sagepub.com/doi/10.1177/20531680231183512). The weakly labeled dataset contains 32,060 press releases.
 The second dataset is the human-annotated dataset that is used for training partypress/partypress-multilingual. For training only the single-coded examples were used (24,117). Evaluation was performed on the data that is annotated by two human coders per example (3,121).