--- license: apache-2.0 base_model: bartpho tags: - vietnamese - hate-speech - span-detection - token-classification - nlp datasets: - visolex/ViHOS model-index: - name: bartpho-hsd-span results: - task: type: token-classification name: Hate Speech Span Detection dataset: name: visolex/ViHOS type: visolex/ViHOS metrics: - type: f1 value: 0.3361 - type: precision value: 0.5521 - type: recall value: 0.5095 - type: exact_match value: 0.0226 --- # bartpho-hsd-span: Hate Speech Span Detection (Vietnamese) This model is a fine-tuned version of [bartpho](https://huggingface.co/bartpho) for Vietnamese **Hate Speech Span Detection**. ## Model Details - Base Model: `bartpho` - Description: Vietnamese Hate Speech Span Detection - Framework: HuggingFace Transformers - Task: Hate Speech Span Detection (token/char-level spans) ### Hyperparameters - Max sequence length: `64` - Learning rate: `5e-6` - Batch size: `32` - Epochs: `100` - Early stopping patience: `5` ## Results - F1: `0.3361` - Precision: `0.5521` - Recall: `0.5095` - Exact Match: `0.0226` ## Usage ```python from transformers import AutoTokenizer, AutoModelForTokenClassification import torch model_name = "bartpho-hsd-span" tok = AutoTokenizer.from_pretrained(model_name) model = AutoModelForTokenClassification.from_pretrained(model_name) text = "Ví dụ câu tiếng Việt có nội dung thù ghét ..." enc = tok(text, return_tensors="pt", truncation=True, max_length=256, is_split_into_words=False) with torch.no_grad(): logits = model(**enc).logits pred_ids = logits.argmax(-1)[0].tolist() # TODO: chuyển pred_ids -> spans theo scheme nhãn của bạn (BIO/BILOU/char-offset) ``` ## License Apache-2.0 ## Acknowledgments - Base model: [bartpho](https://huggingface.co/bartpho)