Thai Toxic Classifier 🇹🇭
A Thai language toxicity detection model trained to classify whether a Thai sentence is toxic or non-toxic.
The model is intended for research and experimentation in Thai NLP safety, moderation systems, and toxicity analysis.
Repository:
https://huggingface.co/mashironotdev/thai-toxic-classifier
Model Details
Model Description
This model performs binary text classification on Thai text:
| Label | Meaning |
|---|---|
| 0 | non-toxic |
| 1 | toxic |
Example:
| Text | Prediction |
|---|---|
| สวัสดีครับ | non-toxic |
| ขอบคุณมากครับ | non-toxic |
| มึงโง่หรือไง | toxic |
| ไอ้ควาย | toxic |
Intended Use
This model is designed for:
- Thai toxicity detection research
- content moderation experiments
- NLP benchmarking
- Thai language safety evaluation
Possible downstream uses:
- chat moderation
- comment filtering
- social media toxicity analysis
Out-of-Scope Use
This model should not be used for:
- legal moderation decisions
- automated punishment systems
- sensitive content governance without human oversight
Training Data
The model was trained on Thai toxicity datasets including:
- Thai Toxicity Tweet dataset
- synthetic toxic Thai sentences
- Thai profanity word lists
The dataset contains Thai sentences labeled as toxic or non-toxic.
Training Procedure
Preprocessing
Typical preprocessing steps:
- Thai text normalization
- tokenization using the model tokenizer
- padding and truncation
Training Configuration
Example configuration:
Quick Usage
# install dependencies
# pip install transformers torch
from transformers import pipeline
# load model from Hugging Face
classifier = pipeline(
"text-classification",
model="mashironotdev/thai-toxic-classifier"
)
# example inputs
texts = [
"สวัสดีครับ",
"ขอบคุณมากครับ",
"มึงโง่หรือไง",
"ไอ้ควาย"
]
# run inference
results = classifier(texts)
# print results
for text, result in zip(texts, results):
print(text, "->", result)
- Downloads last month
- 5