---
license: mit
language: en
tags:
  - text-classification
  - toxicity
  - moderation
  - chat
  - bert
  - pytorch
  - onnx
datasets:
  - dormlab/chat-corpus
metrics:
  - accuracy
  - f1
  - precision
  - recall
pipeline_tag: text-classification
---

# Toxic Chat Moderation

Binary classifier for real-time chat moderation. Flags toxic, hateful, harassing,
sexually explicit, and otherwise inappropriate messages in gaming and social chat.

Based on  fine-tuned on 300K labeled chat messages.

## Quick use


## Performance

| Metric | Score |
|--------|-------|
| Accuracy | 0.9768 |
| F1 | 0.9768 |
| Precision | 0.9643 |
| Recall | 0.9897 |

ONNX INT8 latency: ~1-3ms on Apple Silicon (CoreML/MPS).

## Training

- **Architecture**: bert-base-uncased (110M params), 2 labels (clean/toxic)
- **Hardware**: Apple Silicon Mac Mini (MPS), single-node
- **Data**: 153K messages (122,688 train / 15,336 val / 15,336 test)
- **Framework**: PyTorch, HuggingFace Trainer
- **Export**: ONNX dynamic INT8 quantization (105 MB)

## Variants

This repo provides two model formats:
-  — full PyTorch weights for use with usage: transformers <command> [<args>]

positional arguments:
  {chat,convert,download,env,run,serve,add-new-model-like,add-fast-image-processor}
                        transformers command helpers
    convert             CLI tool to run convert model from original author
                        checkpoints to Transformers PyTorch checkpoints.
    run                 Run a pipeline through the CLI
    serve               CLI tool to run inference requests through REST and
                        GraphQL endpoints.

options:
  -h, --help            show this help message and exit
-  — ONNX INT8 quantized for fast inference on CPU/CoreML

## Label mapping

| Label | Meaning |
|--------|---------|
| 0 | Clean — allow |
| 1 | Toxic — block/flag |