Thai Toxic Classifier 🇹🇭

A Thai language toxicity detection model trained to classify whether a Thai sentence is toxic or non-toxic.

The model is intended for research and experimentation in Thai NLP safety, moderation systems, and toxicity analysis.

Repository:
https://huggingface.co/mashironotdev/thai-toxic-classifier

Model Details

Model Description

This model performs binary text classification on Thai text:

Label	Meaning
0	non-toxic
1	toxic

Example:

Text	Prediction
สวัสดีครับ	non-toxic
ขอบคุณมากครับ	non-toxic
มึงโง่หรือไง	toxic
ไอ้ควาย	toxic

Intended Use

This model is designed for:

Thai toxicity detection research
content moderation experiments
NLP benchmarking
Thai language safety evaluation

Possible downstream uses:

chat moderation
comment filtering
social media toxicity analysis

Out-of-Scope Use

This model should not be used for:

legal moderation decisions
automated punishment systems
sensitive content governance without human oversight

Training Data

The model was trained on Thai toxicity datasets including:

Thai Toxicity Tweet dataset
synthetic toxic Thai sentences
Thai profanity word lists

The dataset contains Thai sentences labeled as toxic or non-toxic.

Training Procedure

Preprocessing

Typical preprocessing steps:

Thai text normalization
tokenization using the model tokenizer
padding and truncation

Training Configuration

Example configuration:

Quick Usage

# install dependencies
# pip install transformers torch

from transformers import pipeline

# load model from Hugging Face
classifier = pipeline(
    "text-classification",
    model="mashironotdev/thai-toxic-classifier"
)

# example inputs
texts = [
    "สวัสดีครับ",
    "ขอบคุณมากครับ",
    "มึงโง่หรือไง",
    "ไอ้ควาย"
]

# run inference
results = classifier(texts)

# print results
for text, result in zip(texts, results):
    print(text, "->", result)

Downloads last month: 5

Safetensors

Model size

0.3B params

Tensor type

F32

mashironotdev
/

thai-toxic-classifier