File size: 2,542 Bytes

c9d1770
6eed8a0
 
c9d1770
6eed8a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c9d1770
 
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
 
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
 
 
 
c9d1770
6eed8a0
c9d1770
6eed8a0
 
 
 
 
 
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
 
 
 
c9d1770
6eed8a0
c9d1770
6eed8a0
 
 
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
 
 
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
 
 
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
 
 
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
f852f16

---
language:
- th
library_name: transformers
pipeline_tag: text-classification
tags:
- thai
- toxicity-detection
- hate-speech
- nlp
- text-classification
datasets:
- SEACrowd/thai_toxicity_tweet
metrics:
- accuracy
- f1
model-index:
- name: thai-toxic-classifier
  results: []
---

# Thai Toxic Classifier 🇹🇭

A Thai language toxicity detection model trained to classify whether a Thai sentence is **toxic** or **non-toxic**.

The model is intended for research and experimentation in **Thai NLP safety, moderation systems, and toxicity analysis**.

Repository:  
https://huggingface.co/mashironotdev/thai-toxic-classifier

---

# Model Details

## Model Description

This model performs **binary text classification** on Thai text:

| Label | Meaning |
|-----|-----|
| 0 | non-toxic |
| 1 | toxic |

Example:

| Text | Prediction |
|-----|-----|
| สวัสดีครับ | non-toxic |
| ขอบคุณมากครับ | non-toxic |
| มึงโง่หรือไง | toxic |
| ไอ้ควาย | toxic |

---

## Intended Use

This model is designed for:

- Thai toxicity detection research
- content moderation experiments
- NLP benchmarking
- Thai language safety evaluation

Possible downstream uses:

- chat moderation
- comment filtering
- social media toxicity analysis

---

## Out-of-Scope Use

This model **should not be used for:**

- legal moderation decisions
- automated punishment systems
- sensitive content governance without human oversight

---

# Training Data

The model was trained on Thai toxicity datasets including:

- Thai Toxicity Tweet dataset
- synthetic toxic Thai sentences
- Thai profanity word lists

The dataset contains Thai sentences labeled as **toxic** or **non-toxic**.

---

# Training Procedure

## Preprocessing

Typical preprocessing steps:

- Thai text normalization
- tokenization using the model tokenizer
- padding and truncation

---

## Training Configuration

Example configuration:

## Quick Usage

```python
# install dependencies
# pip install transformers torch

from transformers import pipeline

# load model from Hugging Face
classifier = pipeline(
    "text-classification",
    model="mashironotdev/thai-toxic-classifier"
)

# example inputs
texts = [
    "สวัสดีครับ",
    "ขอบคุณมากครับ",
    "มึงโง่หรือไง",
    "ไอ้ควาย"
]

# run inference
results = classifier(texts)

# print results
for text, result in zip(texts, results):
    print(text, "->", result)
```