--- language: - vi tags: - hate-speech-detection - vietnamese-nlp - text-classification - offensive-language-detection license: mit datasets: - vihsd base_model: vinai/bartpho-syllable-base --- # BARTpho BARTpho fine-tuned cho bài toán phân loại Hate Speech tiếng Việt ## Model Details ### Model Type BARTpho (Bidirectional and Auto-Regressive Transformer cho tiếng Việt) ### Base Model This model is fine-tuned from [vinai/bartpho-syllable-base](https://huggingface.co/vinai/bartpho-syllable-base) ### Training Info - **Task**: Hate Speech Classification - **Language**: Vietnamese - **Labels**: - `0`: CLEAN (Normal content) - `1`: OFFENSIVE (Mildly offensive content) - `2`: HATE (Hate speech) ## 📊 Model Performance | Metric | Score | |--------|-------| | Accuracy | 0.8985 | | F1 Macro | 0.6791 | | F1 Weighted | 0.8886 | ## Model Description This model has been fine-tuned on the ViHSD (Vietnamese Hate Speech Dataset) to classify Vietnamese text into three categories: CLEAN, OFFENSIVE, and HATE. ### Architecture BARTpho (Bidirectional and Auto-Regressive Transformer cho tiếng Việt) The model combines the powerful pretrained representations with task-specific fine-tuning for effective hate speech detection in Vietnamese social media content. ## How to Use ### 1. Using Transformers Pipeline ```python from transformers import pipeline # Initialize the hate speech classifier classifier = pipeline( "text-classification", model="visolex/hate-speech-bartpho", tokenizer="visolex/hate-speech-bartpho", return_all_scores=True ) # Classify text results = classifier("Văn bản tiếng Việt cần kiểm tra") print(results) ``` ### 2. Using AutoModel ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model_name = "visolex/hate-speech-bartpho" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Prepare text text = "Văn bản tiếng Việt cần kiểm tra" inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=256) # Get predictions with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits # Get probabilities probabilities = torch.nn.functional.softmax(logits, dim=-1) # Get predicted label predicted_label = torch.argmax(probabilities, dim=-1).item() confidence = probabilities[0][predicted_label].item() # Label mapping label_mapping = { 0: "CLEAN", 1: "OFFENSIVE", 2: "HATE" } print(f"Predicted: {label_mapping[predicted_label]} (Confidence: {confidence:.2%})") ``` ### 3. Batch Processing ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_name = "visolex/hate-speech-bartpho" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # List of texts to classify texts = [ "Bài viết rất hay và bổ ích", "Đồ ngu người ta nói đúng mà", "Cút đi đồ chó" ] # Tokenize and predict inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=256) with torch.no_grad(): outputs = model(**inputs) predictions = torch.argmax(outputs.logits, dim=-1) for text, pred in zip(texts, predictions): label = ["CLEAN", "OFFENSIVE", "HATE"][pred.item()] print(f"{text[:50]} -> {label}") ``` ## Training Details ### Training Data - **Dataset**: ViHSD (Vietnamese Hate Speech Detection Dataset) - **Total samples**: ~10,000 Vietnamese comments from social media - **Training split**: ~70% - **Validation split**: ~15% - **Test split**: ~15% ### Training Configuration - **Framework**: PyTorch + HuggingFace Transformers - **Optimizer**: AdamW - **Learning Rate**: 2e-5 - **Batch Size**: 32 - **Max Length**: 256 tokens - **Epochs**: Optimized via early stopping ### Preprocessing - Text normalization for Vietnamese - Special character handling - Emoji and slang processing ## Evaluation Results Model evaluation metrics on the ViHSD test set: See Model Performance section above for details. ### Label Distribution - **CLEAN (0)**: Normal content without offensive language - **OFFENSIVE (1)**: Mildly offensive or inappropriate content - **HATE (2)**: Hate speech, extremist language, severe threats ## Use Cases - **Social Media Moderation**: Automatic detection of hate speech in Vietnamese social media platforms - **Content Filtering**: Filtering offensive content in Vietnamese text - **Research**: Studying hate speech patterns in Vietnamese online communities ## Limitations and Considerations ⚠️ **Important Limitations**: - Model trained primarily on social media data, may not generalize to formal text - Performance may vary with slang, code-switching, or regional dialects - Model reflects biases present in training data - Should be used as part of a larger moderation system, not sole decision-maker ## Citation If you use this model in your research, please cite: ```bibtex @software{vihsd_bartpho, title = {BARTpho for Vietnamese Hate Speech Detection}, author = {ViSoLex Team}, year = {2024}, url = {https://huggingface.co/visolex/hate-speech-bartpho}, base_model = {vinai/bartpho-syllable-base} } ``` ## Contact & Support - **GitHub**: [ViSoLex Hate Speech Detection](https://github.com/visolex/hate-speech-detection) - **Issues**: [Report Issues](https://github.com/visolex/hate-speech-detection/issues) - **Questions**: Open a discussion on the model's Hugging Face page ## License This model is distributed under the MIT License. ## Acknowledgments - Base model trained by vinai - Dataset: ViHSD (Vietnamese Hate Speech Detection Dataset) - Framework: [Hugging Face Transformers](https://huggingface.co/transformers)