conviette
/

korPolBERT

Text Classification

Model card Files Files and versions

korPolBERT / README.md

conviette's picture

Update README.md

7573721 almost 4 years ago

|

history blame contribute delete

2.16 kB



	---
	license: apache-2.0
	---

	This model is a binary classifier developed to analyze comment authorship patterns on Korean news articles.
	For further details, refer to our paper on Journalism: [News comment sections and online echo chambers: The ideological alignment between partisan news stories and their user comments](https://journals.sagepub.com/doi/full/10.1177/14648849211069241)

	* This model is a BERT classification model to classify Korean user generated comments into binary labels of liberal or conservative.
	* This model was trained on approximately 37,000 user generated comments collected from NAVER\'s news portal. The dataset was collected in 2019; as such, note that comments related to recent political topics might not be classified correctly.
	* This model is a finetuned model based on ETRI\'s KorBERT.


	### How to use
	* The model requires an edited version of the transformers class `BertTokenizer`, which can be found in the file `KorBertTokenizer.py`.
	* Usage example:

	~~~python
	from KorBertTokenizer import KorBertTokenizer
	from transformers import BertForSequenceClassification
	import torch

	tokenizer = KorBertTokenizer.from_pretrained('conviette/korPolBERT')
	model = BertForSequenceClassification.from_pretrained('conviette/korPolBERT')

	def classify(text):
	inputs = tokenizer(text, padding='max_length', max_length=70, return_tensors='pt')

	with torch.no_grad():
	logits=model(**inputs).logits
	predicted_class_id = logits.argmax().item()
	return model.config.id2label[predicted_class_id]


	input_strings = ['좌파가 나라 경제 안보 말아먹는다',
	'수꼴들은 나라 일본한테 팔아먹었냐']

	for input_string in input_strings:
	print('===\n입력 텍스트: {}\n분류 결과: {}\n==='.format(input_string, classify(input_string)))
	~~~


	### Model performance

	* Accuracy: 0.8322
	* F1-Score: 0.8322
	* For further technical details on the model, refer to our paper for the W-NUT workshop (EMNLP 2019), [The Fallacy of Echo Chambers: Analyzing the Political Slants of User-Generated News Comments in Korean Media](https://aclanthology.org/D19-5548/).