--- license: mit language: en tags: - text-classification - toxicity - moderation - chat - bert - pytorch - onnx datasets: - dormlab/chat-corpus metrics: - accuracy - f1 - precision - recall pipeline_tag: text-classification --- # Toxic Chat Moderation Binary classifier for real-time chat moderation. Flags toxic, hateful, harassing, sexually explicit, and otherwise inappropriate messages in gaming and social chat. Based on fine-tuned on 300K labeled chat messages. ## Quick use ## Performance | Metric | Score | |--------|-------| | Accuracy | 0.9768 | | F1 | 0.9768 | | Precision | 0.9643 | | Recall | 0.9897 | ONNX INT8 latency: ~1-3ms on Apple Silicon (CoreML/MPS). ## Training - **Architecture**: bert-base-uncased (110M params), 2 labels (clean/toxic) - **Hardware**: Apple Silicon Mac Mini (MPS), single-node - **Data**: 153K messages (122,688 train / 15,336 val / 15,336 test) - **Framework**: PyTorch, HuggingFace Trainer - **Export**: ONNX dynamic INT8 quantization (105 MB) ## Variants This repo provides two model formats: - — full PyTorch weights for use with usage: transformers [] positional arguments: {chat,convert,download,env,run,serve,add-new-model-like,add-fast-image-processor} transformers command helpers convert CLI tool to run convert model from original author checkpoints to Transformers PyTorch checkpoints. run Run a pipeline through the CLI serve CLI tool to run inference requests through REST and GraphQL endpoints. options: -h, --help show this help message and exit - — ONNX INT8 quantized for fast inference on CPU/CoreML ## Label mapping | Label | Meaning | |--------|---------| | 0 | Clean — allow | | 1 | Toxic — block/flag |