mashironotdev
/

thai-toxic-classifier

Text Classification

toxicity-detection

text-embeddings-inference

Model card Files Files and versions

thai-toxic-classifier / README.md

mashironotdev's picture

Update README.md

f852f16 verified 2 months ago

|

history blame contribute delete

2.54 kB

	---
	language:
	- th
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- thai
	- toxicity-detection
	- hate-speech
	- nlp
	- text-classification
	datasets:
	- SEACrowd/thai_toxicity_tweet
	metrics:
	- accuracy
	- f1
	model-index:
	- name: thai-toxic-classifier
	results: []
	---

	# Thai Toxic Classifier 🇹🇭

	A Thai language toxicity detection model trained to classify whether a Thai sentence is toxic or non-toxic.

	The model is intended for research and experimentation in Thai NLP safety, moderation systems, and toxicity analysis.

	Repository:
	https://huggingface.co/mashironotdev/thai-toxic-classifier

	---

	# Model Details

	## Model Description

	This model performs binary text classification on Thai text:

	\| Label \| Meaning \|
	\|-----\|-----\|
	\| 0 \| non-toxic \|
	\| 1 \| toxic \|

	Example:

	\| Text \| Prediction \|
	\|-----\|-----\|
	\| สวัสดีครับ \| non-toxic \|
	\| ขอบคุณมากครับ \| non-toxic \|
	\| มึงโง่หรือไง \| toxic \|
	\| ไอ้ควาย \| toxic \|

	---

	## Intended Use

	This model is designed for:

	- Thai toxicity detection research
	- content moderation experiments
	- NLP benchmarking
	- Thai language safety evaluation

	Possible downstream uses:

	- chat moderation
	- comment filtering
	- social media toxicity analysis

	---

	## Out-of-Scope Use

	This model should not be used for:

	- legal moderation decisions
	- automated punishment systems
	- sensitive content governance without human oversight

	---

	# Training Data

	The model was trained on Thai toxicity datasets including:

	- Thai Toxicity Tweet dataset
	- synthetic toxic Thai sentences
	- Thai profanity word lists

	The dataset contains Thai sentences labeled as toxic or non-toxic.

	---

	# Training Procedure

	## Preprocessing

	Typical preprocessing steps:

	- Thai text normalization
	- tokenization using the model tokenizer
	- padding and truncation

	---

	## Training Configuration

	Example configuration:

	## Quick Usage

	```python
	# install dependencies
	# pip install transformers torch

	from transformers import pipeline

	# load model from Hugging Face
	classifier = pipeline(
	"text-classification",
	model="mashironotdev/thai-toxic-classifier"
	)

	# example inputs
	texts = [
	"สวัสดีครับ",
	"ขอบคุณมากครับ",
	"มึงโง่หรือไง",
	"ไอ้ควาย"
	]

	# run inference
	results = classifier(texts)

	# print results
	for text, result in zip(texts, results):
	print(text, "->", result)
	```