|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: phobert-v2 |
|
|
tags: |
|
|
- vietnamese |
|
|
- hate-speech |
|
|
- span-detection |
|
|
- token-classification |
|
|
- nlp |
|
|
datasets: |
|
|
- visolex/ViHOS |
|
|
model-index: |
|
|
- name: phobert-v2-hsd-span |
|
|
results: |
|
|
- task: |
|
|
type: token-classification |
|
|
name: Hate Speech Span Detection |
|
|
dataset: |
|
|
name: visolex/ViHOS |
|
|
type: visolex/ViHOS |
|
|
metrics: |
|
|
- type: f1 |
|
|
value: 0.6326 |
|
|
- type: precision |
|
|
value: 0.6494 |
|
|
- type: recall |
|
|
value: 0.6305 |
|
|
- type: exact_match |
|
|
value: 0.0000 |
|
|
--- |
|
|
|
|
|
# phobert-v2-hsd-span: Hate Speech Span Detection (Vietnamese) |
|
|
|
|
|
This model is a fine-tuned version of [phobert-v2](https://huggingface.co/phobert-v2) for Vietnamese **Hate Speech Span Detection**. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- Base Model: `phobert-v2` |
|
|
- Description: Vietnamese Hate Speech Span Detection |
|
|
- Framework: HuggingFace Transformers |
|
|
- Task: Hate Speech Span Detection (token/char-level spans) |
|
|
|
|
|
### Hyperparameters |
|
|
|
|
|
- Max sequence length: `64` |
|
|
- Learning rate: `5e-6` |
|
|
- Batch size: `32` |
|
|
- Epochs: `100` |
|
|
- Early stopping patience: `5` |
|
|
|
|
|
## Results |
|
|
|
|
|
- F1: `0.6326` |
|
|
- Precision: `0.6494` |
|
|
- Recall: `0.6305` |
|
|
- Exact Match: `0.0000` |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
|
import torch |
|
|
|
|
|
model_name = "phobert-v2-hsd-span" |
|
|
tok = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForTokenClassification.from_pretrained(model_name) |
|
|
text = "Ví dụ câu tiếng Việt có nội dung thù ghét ..." |
|
|
enc = tok(text, return_tensors="pt", truncation=True, max_length=256, is_split_into_words=False) |
|
|
with torch.no_grad(): |
|
|
logits = model(**enc).logits |
|
|
pred_ids = logits.argmax(-1)[0].tolist() |
|
|
# TODO: chuyển pred_ids -> spans theo scheme nhãn của bạn (BIO/BILOU/char-offset) |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache-2.0 |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Base model: [phobert-v2](https://huggingface.co/phobert-v2) |
|
|
|