bartpho-hsd / README.md

AnnyNguyen

Upload README.md with huggingface_hub

aa2a5c8 verified 18 days ago

preview code

raw

history blame

5.9 kB

metadata

language:
  - vi
tags:
  - hate-speech-detection
  - vietnamese-nlp
  - text-classification
  - offensive-language-detection
license: mit
datasets:
  - vihsd
base_model: vinai/bartpho-syllable-base

BARTpho

BARTpho fine-tuned cho bài toán phân loại Hate Speech tiếng Việt

Model Details

Model Type

BARTpho (Bidirectional and Auto-Regressive Transformer cho tiếng Việt)

Base Model

This model is fine-tuned from vinai/bartpho-syllable-base

Training Info

Task: Hate Speech Classification
Language: Vietnamese
Labels:
- 0: CLEAN (Normal content)
- 1: OFFENSIVE (Mildly offensive content)
- 2: HATE (Hate speech)

📊 Model Performance

Metric	Score
Accuracy	0.8985
F1 Macro	0.6791
F1 Weighted	0.8886

Model Description

This model has been fine-tuned on the ViHSD (Vietnamese Hate Speech Dataset) to classify Vietnamese text into three categories: CLEAN, OFFENSIVE, and HATE.

Architecture

BARTpho (Bidirectional and Auto-Regressive Transformer cho tiếng Việt)

The model combines the powerful pretrained representations with task-specific fine-tuning for effective hate speech detection in Vietnamese social media content.

How to Use

1. Using Transformers Pipeline

from transformers import pipeline

# Initialize the hate speech classifier
classifier = pipeline(
    "text-classification",
    model="visolex/hate-speech-bartpho",
    tokenizer="visolex/hate-speech-bartpho",
    return_all_scores=True
)

# Classify text
results = classifier("Văn bản tiếng Việt cần kiểm tra")
print(results)

2. Using AutoModel

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "visolex/hate-speech-bartpho"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare text
text = "Văn bản tiếng Việt cần kiểm tra"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=256)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    
    # Get probabilities
    probabilities = torch.nn.functional.softmax(logits, dim=-1)
    
    # Get predicted label
    predicted_label = torch.argmax(probabilities, dim=-1).item()
    confidence = probabilities[0][predicted_label].item()

# Label mapping
label_mapping = {
    0: "CLEAN",
    1: "OFFENSIVE",
    2: "HATE"
}

print(f"Predicted: {label_mapping[predicted_label]} (Confidence: {confidence:.2%})")

3. Batch Processing

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "visolex/hate-speech-bartpho"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# List of texts to classify
texts = [
    "Bài viết rất hay và bổ ích",
    "Đồ ngu người ta nói đúng mà",
    "Cút đi đồ chó"
]

# Tokenize and predict
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=256)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=-1)

for text, pred in zip(texts, predictions):
    label = ["CLEAN", "OFFENSIVE", "HATE"][pred.item()]
    print(f"{text[:50]} -> {label}")

Training Details

Training Data

Dataset: ViHSD (Vietnamese Hate Speech Detection Dataset)
Total samples: ~10,000 Vietnamese comments from social media
Training split: ~70%
Validation split: ~15%
Test split: ~15%

Training Configuration

Framework: PyTorch + HuggingFace Transformers
Optimizer: AdamW
Learning Rate: 2e-5
Batch Size: 32
Max Length: 256 tokens
Epochs: Optimized via early stopping

Preprocessing

Text normalization for Vietnamese
Special character handling
Emoji and slang processing

Evaluation Results

Model evaluation metrics on the ViHSD test set: See Model Performance section above for details.

Label Distribution

CLEAN (0): Normal content without offensive language
OFFENSIVE (1): Mildly offensive or inappropriate content
HATE (2): Hate speech, extremist language, severe threats

Use Cases

Social Media Moderation: Automatic detection of hate speech in Vietnamese social media platforms
Content Filtering: Filtering offensive content in Vietnamese text
Research: Studying hate speech patterns in Vietnamese online communities

Limitations and Considerations

⚠️ Important Limitations:

Model trained primarily on social media data, may not generalize to formal text
Performance may vary with slang, code-switching, or regional dialects
Model reflects biases present in training data
Should be used as part of a larger moderation system, not sole decision-maker

Citation

If you use this model in your research, please cite:

@software{vihsd_bartpho,
  title = {BARTpho for Vietnamese Hate Speech Detection},
  author = {ViSoLex Team},
  year = {2024},
  url = {https://huggingface.co/visolex/hate-speech-bartpho},
  base_model = {vinai/bartpho-syllable-base}
}

Contact & Support

GitHub: ViSoLex Hate Speech Detection
Issues: Report Issues
Questions: Open a discussion on the model's Hugging Face page

License

This model is distributed under the MIT License.

Acknowledgments

Base model trained by vinai
Dataset: ViHSD (Vietnamese Hate Speech Detection Dataset)
Framework: Hugging Face Transformers