|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: bartpho |
|
|
tags: |
|
|
- vietnamese |
|
|
- hate-speech |
|
|
- span-detection |
|
|
- token-classification |
|
|
- nlp |
|
|
datasets: |
|
|
- visolex/ViHOS |
|
|
model-index: |
|
|
- name: bartpho-hsd-span |
|
|
results: |
|
|
- task: |
|
|
type: token-classification |
|
|
name: Hate Speech Span Detection |
|
|
dataset: |
|
|
name: visolex/ViHOS |
|
|
type: visolex/ViHOS |
|
|
metrics: |
|
|
- type: f1 |
|
|
value: 0.3361 |
|
|
- type: precision |
|
|
value: 0.5521 |
|
|
- type: recall |
|
|
value: 0.5095 |
|
|
- type: exact_match |
|
|
value: 0.0226 |
|
|
--- |
|
|
|
|
|
# bartpho-hsd-span: Hate Speech Span Detection (Vietnamese) |
|
|
|
|
|
This model is a fine-tuned version of [bartpho](https://huggingface.co/bartpho) for Vietnamese **Hate Speech Span Detection**. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- Base Model: `bartpho` |
|
|
- Description: Vietnamese Hate Speech Span Detection |
|
|
- Framework: HuggingFace Transformers |
|
|
- Task: Hate Speech Span Detection (token/char-level spans) |
|
|
|
|
|
### Hyperparameters |
|
|
|
|
|
- Max sequence length: `64` |
|
|
- Learning rate: `5e-6` |
|
|
- Batch size: `32` |
|
|
- Epochs: `100` |
|
|
- Early stopping patience: `5` |
|
|
|
|
|
## Results |
|
|
|
|
|
- F1: `0.3361` |
|
|
- Precision: `0.5521` |
|
|
- Recall: `0.5095` |
|
|
- Exact Match: `0.0226` |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
|
import torch |
|
|
|
|
|
model_name = "bartpho-hsd-span" |
|
|
tok = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForTokenClassification.from_pretrained(model_name) |
|
|
text = "Ví dụ câu tiếng Việt có nội dung thù ghét ..." |
|
|
enc = tok(text, return_tensors="pt", truncation=True, max_length=256, is_split_into_words=False) |
|
|
with torch.no_grad(): |
|
|
logits = model(**enc).logits |
|
|
pred_ids = logits.argmax(-1)[0].tolist() |
|
|
# TODO: chuyển pred_ids -> spans theo scheme nhãn của bạn (BIO/BILOU/char-offset) |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache-2.0 |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Base model: [bartpho](https://huggingface.co/bartpho) |
|
|
|