mashironotdev's picture
Update README.md
f852f16 verified
---
language:
- th
library_name: transformers
pipeline_tag: text-classification
tags:
- thai
- toxicity-detection
- hate-speech
- nlp
- text-classification
datasets:
- SEACrowd/thai_toxicity_tweet
metrics:
- accuracy
- f1
model-index:
- name: thai-toxic-classifier
results: []
---
# Thai Toxic Classifier 🇹🇭
A Thai language toxicity detection model trained to classify whether a Thai sentence is **toxic** or **non-toxic**.
The model is intended for research and experimentation in **Thai NLP safety, moderation systems, and toxicity analysis**.
Repository:
https://huggingface.co/mashironotdev/thai-toxic-classifier
---
# Model Details
## Model Description
This model performs **binary text classification** on Thai text:
| Label | Meaning |
|-----|-----|
| 0 | non-toxic |
| 1 | toxic |
Example:
| Text | Prediction |
|-----|-----|
| สวัสดีครับ | non-toxic |
| ขอบคุณมากครับ | non-toxic |
| มึงโง่หรือไง | toxic |
| ไอ้ควาย | toxic |
---
## Intended Use
This model is designed for:
- Thai toxicity detection research
- content moderation experiments
- NLP benchmarking
- Thai language safety evaluation
Possible downstream uses:
- chat moderation
- comment filtering
- social media toxicity analysis
---
## Out-of-Scope Use
This model **should not be used for:**
- legal moderation decisions
- automated punishment systems
- sensitive content governance without human oversight
---
# Training Data
The model was trained on Thai toxicity datasets including:
- Thai Toxicity Tweet dataset
- synthetic toxic Thai sentences
- Thai profanity word lists
The dataset contains Thai sentences labeled as **toxic** or **non-toxic**.
---
# Training Procedure
## Preprocessing
Typical preprocessing steps:
- Thai text normalization
- tokenization using the model tokenizer
- padding and truncation
---
## Training Configuration
Example configuration:
## Quick Usage
```python
# install dependencies
# pip install transformers torch
from transformers import pipeline
# load model from Hugging Face
classifier = pipeline(
"text-classification",
model="mashironotdev/thai-toxic-classifier"
)
# example inputs
texts = [
"สวัสดีครับ",
"ขอบคุณมากครับ",
"มึงโง่หรือไง",
"ไอ้ควาย"
]
# run inference
results = classifier(texts)
# print results
for text, result in zip(texts, results):
print(text, "->", result)
```