visolex
/

bartpho-hsd

@@ -1,155 +0,0 @@
----
-license: mit
-base_model: vinai/bartpho-syllable-base
-tags:
-- vietnamese
-- hate-speech-detection
-- text-classification
-- offensive-language-detection
-datasets:
-- visolex/vihsd
-metrics:
-- accuracy
-- macro-f1
-- weighted-f1
-model-index:
-- name: bartpho-hsd
-  results:
-  - task:
-      type: text-classification
-      name: Hate Speech Detection
-    dataset:
-      name: ViHSD
-      type: hate-speech-detection
-    metrics:
-    - type: accuracy
-      value: 0.8985
-    - type: macro-f1
-      value: 0.6791
-    - type: weighted-f1
-      value: 0.8886
-    - type: macro-precision
-      value: 0.7664
-    - type: macro-recall
-      value: 0.6289
----
-# BARTpho: Hate Speech Detection for Vietnamese Text
-This model is a fine-tuned version of [vinai/bartpho-syllable-base](https://huggingface.co/vinai/bartpho-syllable-base)
-on the **ViHSD (Vietnamese Hate Speech Detection Dataset)** for classifying Vietnamese text into three categories: CLEAN, OFFENSIVE, and HATE.
-## Model Details
-* **Base Model**: vinai/bartpho-syllable-base
-* **Description**: BARTpho fine-tuned cho bài toán phân loại Hate Speech tiếng Việt
-* **Architecture**: BARTpho (Bidirectional and Auto-Regressive Transformer cho tiếng Việt)
-* **Dataset**: ViHSD (Vietnamese Hate Speech Detection Dataset)
-* **Fine-tuning Framework**: HuggingFace Transformers + PyTorch
-* **Task**: Hate Speech Classification (3 classes)
-### Hyperparameters
-* **Batch size**: `32`
-* **Learning rate**: `2e-5`
-* **Epochs**: `100`
-* **Max sequence length**: `256`
-* **Weight decay**: `0.01`
-* **Warmup steps**: `500`
-* **Early stopping patience**: `5`
-* **Optimizer**: AdamW
-* **Learning rate scheduler**: Cosine with warmup
-## Dataset
-Model was trained on **ViHSD (Vietnamese Hate Speech Detection Dataset)** containing ~10,000 Vietnamese comments from social media.
-### Label Descriptions:
-* **CLEAN (0)**: Normal content without offensive language
-* **OFFENSIVE (1)**: Mildly offensive or inappropriate content
-* **HATE (2)**: Hate speech, extremist language, severe threats
-## Evaluation Results
-The model was evaluated on test set with the following metrics:
-* **Accuracy**: `0.8985`
-* **Macro-F1**: `0.6791`
-* **Weighted-F1**: `0.8886`
-* **Macro-Precision**: `0.7664`
-* **Macro-Recall**: `0.6289`
-### Basic Usage
-```python
-from transformers import AutoTokenizer, AutoModelForSequenceClassification
-import torch
-# Load model and tokenizer
-model_name = "visolex/bartpho-hsd"
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForSequenceClassification.from_pretrained(
-    model_name
-)
-# Classify text
-text = "Văn bản tiếng Việt cần phân loại"
-inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
-with torch.no_grad():
-    outputs = model(**inputs)
-    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
-    predicted_label = torch.argmax(predictions, dim=-1).item()
-# Label mapping
-label_names = {
-    0: "CLEAN",
-    1: "OFFENSIVE",
-    2: "HATE"
-}
-print(f"Predicted label: {label_names[predicted_label]}")
-print(f"Confidence scores: {predictions[0].tolist()}")
-```
-## Training Details
-### Training Data
-- **Dataset**: ViHSD (Vietnamese Hate Speech Detection Dataset)
-- **Total samples**: ~10,000 Vietnamese comments from social media
-- **Training split**: ~70%
-- **Validation split**: ~15%
-- **Test split**: ~15%
-### Training Configuration
-- **Framework**: PyTorch + HuggingFace Transformers
-- **Optimizer**: AdamW
-- **Learning Rate**: 2e-5
-- **Batch Size**: 32
-- **Max Length**: 256 tokens
-- **Epochs**: 100 (with early stopping patience: 5)
-- **Weight Decay**: 0.01
-- **Warmup Steps**: 500
-## Contact & Support
-- **GitHub**: [ViSoLex Hate Speech Detection](https://github.com/visolex/hate-speech-detection)
-- **Issues**: [Report Issues](https://github.com/visolex/hate-speech-detection/issues)
-- **Questions**: Open a discussion on the model's Hugging Face page
-## License
-This model is distributed under the MIT License.
-## Acknowledgments
-- Base model: [vinai/bartpho-syllable-base](https://huggingface.co/vinai/bartpho-syllable-base)
-- Dataset: ViHSD (Vietnamese Hate Speech Detection Dataset)
-- Framework: [Hugging Face Transformers](https://huggingface.co/transformers)
-- ViSoLex Toolkit
----