--- language: - th library_name: transformers pipeline_tag: text-classification tags: - thai - toxicity-detection - hate-speech - nlp - text-classification datasets: - SEACrowd/thai_toxicity_tweet metrics: - accuracy - f1 model-index: - name: thai-toxic-classifier results: [] --- # Thai Toxic Classifier ðŸ‡đ🇭 A Thai language toxicity detection model trained to classify whether a Thai sentence is **toxic** or **non-toxic**. The model is intended for research and experimentation in **Thai NLP safety, moderation systems, and toxicity analysis**. Repository: https://huggingface.co/mashironotdev/thai-toxic-classifier --- # Model Details ## Model Description This model performs **binary text classification** on Thai text: | Label | Meaning | |-----|-----| | 0 | non-toxic | | 1 | toxic | Example: | Text | Prediction | |-----|-----| | āļŠāļ§āļąāļŠāļ”āļĩāļ„āļĢāļąāļš | non-toxic | | āļ‚āļ­āļšāļ„āļļāļ“āļĄāļēāļāļ„āļĢāļąāļš | non-toxic | | āļĄāļķāļ‡āđ‚āļ‡āđˆāļŦāļĢāļ·āļ­āđ„āļ‡ | toxic | | āđ„āļ­āđ‰āļ„āļ§āļēāļĒ | toxic | --- ## Intended Use This model is designed for: - Thai toxicity detection research - content moderation experiments - NLP benchmarking - Thai language safety evaluation Possible downstream uses: - chat moderation - comment filtering - social media toxicity analysis --- ## Out-of-Scope Use This model **should not be used for:** - legal moderation decisions - automated punishment systems - sensitive content governance without human oversight --- # Training Data The model was trained on Thai toxicity datasets including: - Thai Toxicity Tweet dataset - synthetic toxic Thai sentences - Thai profanity word lists The dataset contains Thai sentences labeled as **toxic** or **non-toxic**. --- # Training Procedure ## Preprocessing Typical preprocessing steps: - Thai text normalization - tokenization using the model tokenizer - padding and truncation --- ## Training Configuration Example configuration: ## Quick Usage ```python # install dependencies # pip install transformers torch from transformers import pipeline # load model from Hugging Face classifier = pipeline( "text-classification", model="mashironotdev/thai-toxic-classifier" ) # example inputs texts = [ "āļŠāļ§āļąāļŠāļ”āļĩāļ„āļĢāļąāļš", "āļ‚āļ­āļšāļ„āļļāļ“āļĄāļēāļāļ„āļĢāļąāļš", "āļĄāļķāļ‡āđ‚āļ‡āđˆāļŦāļĢāļ·āļ­āđ„āļ‡", "āđ„āļ­āđ‰āļ„āļ§āļēāļĒ" ] # run inference results = classifier(texts) # print results for text, result in zip(texts, results): print(text, "->", result) ```