mashironotdev's picture
Update README.md
f852f16 verified
metadata
language:
  - th
library_name: transformers
pipeline_tag: text-classification
tags:
  - thai
  - toxicity-detection
  - hate-speech
  - nlp
  - text-classification
datasets:
  - SEACrowd/thai_toxicity_tweet
metrics:
  - accuracy
  - f1
model-index:
  - name: thai-toxic-classifier
    results: []

Thai Toxic Classifier 🇹🇭

A Thai language toxicity detection model trained to classify whether a Thai sentence is toxic or non-toxic.

The model is intended for research and experimentation in Thai NLP safety, moderation systems, and toxicity analysis.

Repository:
https://huggingface.co/mashironotdev/thai-toxic-classifier


Model Details

Model Description

This model performs binary text classification on Thai text:

Label Meaning
0 non-toxic
1 toxic

Example:

Text Prediction
สวัสดีครับ non-toxic
ขอบคุณมากครับ non-toxic
มึงโง่หรือไง toxic
ไอ้ควาย toxic

Intended Use

This model is designed for:

  • Thai toxicity detection research
  • content moderation experiments
  • NLP benchmarking
  • Thai language safety evaluation

Possible downstream uses:

  • chat moderation
  • comment filtering
  • social media toxicity analysis

Out-of-Scope Use

This model should not be used for:

  • legal moderation decisions
  • automated punishment systems
  • sensitive content governance without human oversight

Training Data

The model was trained on Thai toxicity datasets including:

  • Thai Toxicity Tweet dataset
  • synthetic toxic Thai sentences
  • Thai profanity word lists

The dataset contains Thai sentences labeled as toxic or non-toxic.


Training Procedure

Preprocessing

Typical preprocessing steps:

  • Thai text normalization
  • tokenization using the model tokenizer
  • padding and truncation

Training Configuration

Example configuration:

Quick Usage

# install dependencies
# pip install transformers torch

from transformers import pipeline

# load model from Hugging Face
classifier = pipeline(
    "text-classification",
    model="mashironotdev/thai-toxic-classifier"
)

# example inputs
texts = [
    "สวัสดีครับ",
    "ขอบคุณมากครับ",
    "มึงโง่หรือไง",
    "ไอ้ควาย"
]

# run inference
results = classifier(texts)

# print results
for text, result in zip(texts, results):
    print(text, "->", result)