| Pretrained K-mHas with multi-label model with "koelectra-v3" | |
| You can use tokenizer of this model with "monologg/koelectra-v3-base-discriminator" | |
| dataset : https://huggingface.co/datasets/jeanlee/kmhas_korean_hate_speech | |
| pretrained_model : https://huggingface.co/monologg/koelectra-base-v3-discriminator | |
| label maps are like this. | |
| >>> | |
| {'origin': 0, | |
| 'physical': 1, | |
| 'politics': 2, | |
| 'profanity': 3, | |
| 'age': 4, | |
| 'gender': 5, | |
| 'race': 6, | |
| 'religion': 7, | |
| 'not_hate_speech': 8} | |
| You can use label map with below code. | |
| > | |
| from huggingface_hub import hf_hub_download | |
| repo_id = "JunHwi/kmhas_multilabel" | |
| filename = "kmhas_dict.pickle" # μ repo_idμ μ λ‘λν νμΌ μ΄λ¦ | |
| label_dict = hf_hub_download(repo_id, filename) | |
| with open(label_dict, "rb") as f: | |
| label2num = pickle.load(f) |