| |
|
| |
|
| | --- |
| | license: apache-2.0 |
| | --- |
| | |
| | This model is a binary classifier developed to analyze comment authorship patterns on Korean news articles. |
| | For further details, refer to our paper on Journalism: [News comment sections and online echo chambers: The ideological alignment between partisan news stories and their user comments](https://journals.sagepub.com/doi/full/10.1177/14648849211069241) |
| |
|
| | * This model is a BERT classification model to classify Korean user generated comments into binary labels of liberal or conservative. |
| | * This model was trained on approximately 37,000 user generated comments collected from NAVER\'s news portal. The dataset was collected in 2019; as such, note that comments related to recent political topics might not be classified correctly. |
| | * This model is a finetuned model based on ETRI\'s KorBERT. |
| |
|
| |
|
| | ### How to use |
| | * The model requires an edited version of the transformers class `BertTokenizer`, which can be found in the file `KorBertTokenizer.py`. |
| | * Usage example: |
| |
|
| | ~~~python |
| | from KorBertTokenizer import KorBertTokenizer |
| | from transformers import BertForSequenceClassification |
| | import torch |
| | |
| | tokenizer = KorBertTokenizer.from_pretrained('conviette/korPolBERT') |
| | model = BertForSequenceClassification.from_pretrained('conviette/korPolBERT') |
| | |
| | def classify(text): |
| | inputs = tokenizer(text, padding='max_length', max_length=70, return_tensors='pt') |
| | |
| | with torch.no_grad(): |
| | logits=model(**inputs).logits |
| | predicted_class_id = logits.argmax().item() |
| | return model.config.id2label[predicted_class_id] |
| | |
| | |
| | input_strings = ['์ขํ๊ฐ ๋๋ผ ๊ฒฝ์ ์๋ณด ๋ง์๋จน๋๋ค', |
| | '์๊ผด๋ค์ ๋๋ผ ์ผ๋ณธํํ
ํ์๋จน์๋'] |
| | |
| | for input_string in input_strings: |
| | print('===\n์
๋ ฅ ํ
์คํธ: {}\n๋ถ๋ฅ ๊ฒฐ๊ณผ: {}\n==='.format(input_string, classify(input_string))) |
| | ~~~ |
| |
|
| |
|
| | ### Model performance |
| |
|
| | * Accuracy: 0.8322 |
| | * F1-Score: 0.8322 |
| | * For further technical details on the model, refer to our paper for the W-NUT workshop (EMNLP 2019), [The Fallacy of Echo Chambers: Analyzing the Political Slants of User-Generated News Comments in Korean Media](https://aclanthology.org/D19-5548/). |
| |
|
| |
|
| |
|
| |
|