File size: 2,155 Bytes
c748f8d afc7f7c c748f8d 8b6c0f4 afc7f7c c748f8d afc7f7c e9213fc 7573721 e9213fc afc7f7c c748f8d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
---
license: apache-2.0
---
This model is a binary classifier developed to analyze comment authorship patterns on Korean news articles.
For further details, refer to our paper on Journalism: [News comment sections and online echo chambers: The ideological alignment between partisan news stories and their user comments](https://journals.sagepub.com/doi/full/10.1177/14648849211069241)
* This model is a BERT classification model to classify Korean user generated comments into binary labels of liberal or conservative.
* This model was trained on approximately 37,000 user generated comments collected from NAVER\'s news portal. The dataset was collected in 2019; as such, note that comments related to recent political topics might not be classified correctly.
* This model is a finetuned model based on ETRI\'s KorBERT.
### How to use
* The model requires an edited version of the transformers class `BertTokenizer`, which can be found in the file `KorBertTokenizer.py`.
* Usage example:
~~~python
from KorBertTokenizer import KorBertTokenizer
from transformers import BertForSequenceClassification
import torch
tokenizer = KorBertTokenizer.from_pretrained('conviette/korPolBERT')
model = BertForSequenceClassification.from_pretrained('conviette/korPolBERT')
def classify(text):
inputs = tokenizer(text, padding='max_length', max_length=70, return_tensors='pt')
with torch.no_grad():
logits=model(**inputs).logits
predicted_class_id = logits.argmax().item()
return model.config.id2label[predicted_class_id]
input_strings = ['์ขํ๊ฐ ๋๋ผ ๊ฒฝ์ ์๋ณด ๋ง์๋จน๋๋ค',
'์๊ผด๋ค์ ๋๋ผ ์ผ๋ณธํํ
ํ์๋จน์๋']
for input_string in input_strings:
print('===\n์
๋ ฅ ํ
์คํธ: {}\n๋ถ๋ฅ ๊ฒฐ๊ณผ: {}\n==='.format(input_string, classify(input_string)))
~~~
### Model performance
* Accuracy: 0.8322
* F1-Score: 0.8322
* For further technical details on the model, refer to our paper for the W-NUT workshop (EMNLP 2019), [The Fallacy of Echo Chambers: Analyzing the Political Slants of User-Generated News Comments in Korean Media](https://aclanthology.org/D19-5548/).
|