Text Classification
Transformers
Safetensors
Thai
xlm-roberta
thai
toxicity-detection
hate-speech
nlp
text-embeddings-inference
Instructions to use mashironotdev/thai-toxic-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mashironotdev/thai-toxic-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="mashironotdev/thai-toxic-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("mashironotdev/thai-toxic-classifier") model = AutoModelForSequenceClassification.from_pretrained("mashironotdev/thai-toxic-classifier") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - th | |
| library_name: transformers | |
| pipeline_tag: text-classification | |
| tags: | |
| - thai | |
| - toxicity-detection | |
| - hate-speech | |
| - nlp | |
| - text-classification | |
| datasets: | |
| - SEACrowd/thai_toxicity_tweet | |
| metrics: | |
| - accuracy | |
| - f1 | |
| model-index: | |
| - name: thai-toxic-classifier | |
| results: [] | |
| # Thai Toxic Classifier 🇹🇭 | |
| A Thai language toxicity detection model trained to classify whether a Thai sentence is **toxic** or **non-toxic**. | |
| The model is intended for research and experimentation in **Thai NLP safety, moderation systems, and toxicity analysis**. | |
| Repository: | |
| https://huggingface.co/mashironotdev/thai-toxic-classifier | |
| --- | |
| # Model Details | |
| ## Model Description | |
| This model performs **binary text classification** on Thai text: | |
| | Label | Meaning | | |
| |-----|-----| | |
| | 0 | non-toxic | | |
| | 1 | toxic | | |
| Example: | |
| | Text | Prediction | | |
| |-----|-----| | |
| | สวัสดีครับ | non-toxic | | |
| | ขอบคุณมากครับ | non-toxic | | |
| | มึงโง่หรือไง | toxic | | |
| | ไอ้ควาย | toxic | | |
| --- | |
| ## Intended Use | |
| This model is designed for: | |
| - Thai toxicity detection research | |
| - content moderation experiments | |
| - NLP benchmarking | |
| - Thai language safety evaluation | |
| Possible downstream uses: | |
| - chat moderation | |
| - comment filtering | |
| - social media toxicity analysis | |
| --- | |
| ## Out-of-Scope Use | |
| This model **should not be used for:** | |
| - legal moderation decisions | |
| - automated punishment systems | |
| - sensitive content governance without human oversight | |
| --- | |
| # Training Data | |
| The model was trained on Thai toxicity datasets including: | |
| - Thai Toxicity Tweet dataset | |
| - synthetic toxic Thai sentences | |
| - Thai profanity word lists | |
| The dataset contains Thai sentences labeled as **toxic** or **non-toxic**. | |
| --- | |
| # Training Procedure | |
| ## Preprocessing | |
| Typical preprocessing steps: | |
| - Thai text normalization | |
| - tokenization using the model tokenizer | |
| - padding and truncation | |
| --- | |
| ## Training Configuration | |
| Example configuration: | |
| ## Quick Usage | |
| ```python | |
| # install dependencies | |
| # pip install transformers torch | |
| from transformers import pipeline | |
| # load model from Hugging Face | |
| classifier = pipeline( | |
| "text-classification", | |
| model="mashironotdev/thai-toxic-classifier" | |
| ) | |
| # example inputs | |
| texts = [ | |
| "สวัสดีครับ", | |
| "ขอบคุณมากครับ", | |
| "มึงโง่หรือไง", | |
| "ไอ้ควาย" | |
| ] | |
| # run inference | |
| results = classifier(texts) | |
| # print results | |
| for text, result in zip(texts, results): | |
| print(text, "->", result) | |
| ``` |