language:
- vi
tags:
- hate-speech-detection
- vietnamese-nlp
- text-classification
- offensive-language-detection
license: mit
datasets:
- vihsd
base_model: vinai/bartpho-syllable-base
BARTpho
BARTpho fine-tuned cho bài toán phân loại Hate Speech tiếng Việt
Model Details
Model Type
BARTpho (Bidirectional and Auto-Regressive Transformer cho tiếng Việt)
Base Model
This model is fine-tuned from vinai/bartpho-syllable-base
Training Info
- Task: Hate Speech Classification
- Language: Vietnamese
- Labels:
0: CLEAN (Normal content)1: OFFENSIVE (Mildly offensive content)2: HATE (Hate speech)
📊 Model Performance
| Metric | Score |
|---|---|
| Accuracy | 0.8985 |
| F1 Macro | 0.6791 |
| F1 Weighted | 0.8886 |
Model Description
This model has been fine-tuned on the ViHSD (Vietnamese Hate Speech Dataset) to classify Vietnamese text into three categories: CLEAN, OFFENSIVE, and HATE.
Architecture
BARTpho (Bidirectional and Auto-Regressive Transformer cho tiếng Việt)
The model combines the powerful pretrained representations with task-specific fine-tuning for effective hate speech detection in Vietnamese social media content.
How to Use
1. Using Transformers Pipeline
from transformers import pipeline
# Initialize the hate speech classifier
classifier = pipeline(
"text-classification",
model="visolex/hate-speech-bartpho",
tokenizer="visolex/hate-speech-bartpho",
return_all_scores=True
)
# Classify text
results = classifier("Văn bản tiếng Việt cần kiểm tra")
print(results)
2. Using AutoModel
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "visolex/hate-speech-bartpho"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Prepare text
text = "Văn bản tiếng Việt cần kiểm tra"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=256)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
# Get probabilities
probabilities = torch.nn.functional.softmax(logits, dim=-1)
# Get predicted label
predicted_label = torch.argmax(probabilities, dim=-1).item()
confidence = probabilities[0][predicted_label].item()
# Label mapping
label_mapping = {
0: "CLEAN",
1: "OFFENSIVE",
2: "HATE"
}
print(f"Predicted: {label_mapping[predicted_label]} (Confidence: {confidence:.2%})")
3. Batch Processing
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "visolex/hate-speech-bartpho"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# List of texts to classify
texts = [
"Bài viết rất hay và bổ ích",
"Đồ ngu người ta nói đúng mà",
"Cút đi đồ chó"
]
# Tokenize and predict
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=256)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=-1)
for text, pred in zip(texts, predictions):
label = ["CLEAN", "OFFENSIVE", "HATE"][pred.item()]
print(f"{text[:50]} -> {label}")
Training Details
Training Data
- Dataset: ViHSD (Vietnamese Hate Speech Detection Dataset)
- Total samples: ~10,000 Vietnamese comments from social media
- Training split: ~70%
- Validation split: ~15%
- Test split: ~15%
Training Configuration
- Framework: PyTorch + HuggingFace Transformers
- Optimizer: AdamW
- Learning Rate: 2e-5
- Batch Size: 32
- Max Length: 256 tokens
- Epochs: Optimized via early stopping
Preprocessing
- Text normalization for Vietnamese
- Special character handling
- Emoji and slang processing
Evaluation Results
Model evaluation metrics on the ViHSD test set: See Model Performance section above for details.
Label Distribution
- CLEAN (0): Normal content without offensive language
- OFFENSIVE (1): Mildly offensive or inappropriate content
- HATE (2): Hate speech, extremist language, severe threats
Use Cases
- Social Media Moderation: Automatic detection of hate speech in Vietnamese social media platforms
- Content Filtering: Filtering offensive content in Vietnamese text
- Research: Studying hate speech patterns in Vietnamese online communities
Limitations and Considerations
⚠️ Important Limitations:
- Model trained primarily on social media data, may not generalize to formal text
- Performance may vary with slang, code-switching, or regional dialects
- Model reflects biases present in training data
- Should be used as part of a larger moderation system, not sole decision-maker
Citation
If you use this model in your research, please cite:
@software{vihsd_bartpho,
title = {BARTpho for Vietnamese Hate Speech Detection},
author = {ViSoLex Team},
year = {2024},
url = {https://huggingface.co/visolex/hate-speech-bartpho},
base_model = {vinai/bartpho-syllable-base}
}
Contact & Support
- GitHub: ViSoLex Hate Speech Detection
- Issues: Report Issues
- Questions: Open a discussion on the model's Hugging Face page
License
This model is distributed under the MIT License.
Acknowledgments
- Base model trained by vinai
- Dataset: ViHSD (Vietnamese Hate Speech Detection Dataset)
- Framework: Hugging Face Transformers