Thai Toxic Classifier 🇹🇭

A Thai language toxicity detection model trained to classify whether a Thai sentence is toxic or non-toxic.

The model is intended for research and experimentation in Thai NLP safety, moderation systems, and toxicity analysis.

Repository:
https://huggingface.co/mashironotdev/thai-toxic-classifier


Model Details

Model Description

This model performs binary text classification on Thai text:

Label Meaning
0 non-toxic
1 toxic

Example:

Text Prediction
สวัสดีครับ non-toxic
ขอบคุณมากครับ non-toxic
มึงโง่หรือไง toxic
ไอ้ควาย toxic

Intended Use

This model is designed for:

  • Thai toxicity detection research
  • content moderation experiments
  • NLP benchmarking
  • Thai language safety evaluation

Possible downstream uses:

  • chat moderation
  • comment filtering
  • social media toxicity analysis

Out-of-Scope Use

This model should not be used for:

  • legal moderation decisions
  • automated punishment systems
  • sensitive content governance without human oversight

Training Data

The model was trained on Thai toxicity datasets including:

  • Thai Toxicity Tweet dataset
  • synthetic toxic Thai sentences
  • Thai profanity word lists

The dataset contains Thai sentences labeled as toxic or non-toxic.


Training Procedure

Preprocessing

Typical preprocessing steps:

  • Thai text normalization
  • tokenization using the model tokenizer
  • padding and truncation

Training Configuration

Example configuration:

Quick Usage

# install dependencies
# pip install transformers torch

from transformers import pipeline

# load model from Hugging Face
classifier = pipeline(
    "text-classification",
    model="mashironotdev/thai-toxic-classifier"
)

# example inputs
texts = [
    "สวัสดีครับ",
    "ขอบคุณมากครับ",
    "มึงโง่หรือไง",
    "ไอ้ควาย"
]

# run inference
results = classifier(texts)

# print results
for text, result in zip(texts, results):
    print(text, "->", result)
Downloads last month
5
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train mashironotdev/thai-toxic-classifier