File size: 2,542 Bytes
c9d1770
6eed8a0
 
c9d1770
6eed8a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c9d1770
 
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
 
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
 
 
 
c9d1770
6eed8a0
c9d1770
6eed8a0
 
 
 
 
 
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
 
 
 
c9d1770
6eed8a0
c9d1770
6eed8a0
 
 
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
 
 
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
 
 
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
6eed8a0
 
 
c9d1770
6eed8a0
c9d1770
6eed8a0
c9d1770
f852f16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
language:
- th
library_name: transformers
pipeline_tag: text-classification
tags:
- thai
- toxicity-detection
- hate-speech
- nlp
- text-classification
datasets:
- SEACrowd/thai_toxicity_tweet
metrics:
- accuracy
- f1
model-index:
- name: thai-toxic-classifier
  results: []
---

# Thai Toxic Classifier 🇹🇭

A Thai language toxicity detection model trained to classify whether a Thai sentence is **toxic** or **non-toxic**.

The model is intended for research and experimentation in **Thai NLP safety, moderation systems, and toxicity analysis**.

Repository:  
https://huggingface.co/mashironotdev/thai-toxic-classifier

---

# Model Details

## Model Description

This model performs **binary text classification** on Thai text:

| Label | Meaning |
|-----|-----|
| 0 | non-toxic |
| 1 | toxic |

Example:

| Text | Prediction |
|-----|-----|
| สวัสดีครับ | non-toxic |
| ขอบคุณมากครับ | non-toxic |
| มึงโง่หรือไง | toxic |
| ไอ้ควาย | toxic |

---

## Intended Use

This model is designed for:

- Thai toxicity detection research
- content moderation experiments
- NLP benchmarking
- Thai language safety evaluation

Possible downstream uses:

- chat moderation
- comment filtering
- social media toxicity analysis

---

## Out-of-Scope Use

This model **should not be used for:**

- legal moderation decisions
- automated punishment systems
- sensitive content governance without human oversight

---

# Training Data

The model was trained on Thai toxicity datasets including:

- Thai Toxicity Tweet dataset
- synthetic toxic Thai sentences
- Thai profanity word lists

The dataset contains Thai sentences labeled as **toxic** or **non-toxic**.

---

# Training Procedure

## Preprocessing

Typical preprocessing steps:

- Thai text normalization
- tokenization using the model tokenizer
- padding and truncation

---

## Training Configuration

Example configuration:

## Quick Usage

```python
# install dependencies
# pip install transformers torch

from transformers import pipeline

# load model from Hugging Face
classifier = pipeline(
    "text-classification",
    model="mashironotdev/thai-toxic-classifier"
)

# example inputs
texts = [
    "สวัสดีครับ",
    "ขอบคุณมากครับ",
    "มึงโง่หรือไง",
    "ไอ้ควาย"
]

# run inference
results = classifier(texts)

# print results
for text, result in zip(texts, results):
    print(text, "->", result)
```