Toxic Chat Moderation
Binary classifier for real-time chat moderation. Flags toxic, hateful, harassing, sexually explicit, and otherwise inappropriate messages in gaming and social chat.
Based on fine-tuned on 300K labeled chat messages.
Quick use
Performance
| Metric | Score |
|---|---|
| Accuracy | 0.9768 |
| F1 | 0.9768 |
| Precision | 0.9643 |
| Recall | 0.9897 |
ONNX INT8 latency: ~1-3ms on Apple Silicon (CoreML/MPS).
Training
- Architecture: bert-base-uncased (110M params), 2 labels (clean/toxic)
- Hardware: Apple Silicon Mac Mini (MPS), single-node
- Data: 153K messages (122,688 train / 15,336 val / 15,336 test)
- Framework: PyTorch, HuggingFace Trainer
- Export: ONNX dynamic INT8 quantization (105 MB)
Variants
This repo provides two model formats:
- โ full PyTorch weights for use with usage: transformers []
positional arguments: {chat,convert,download,env,run,serve,add-new-model-like,add-fast-image-processor} transformers command helpers convert CLI tool to run convert model from original author checkpoints to Transformers PyTorch checkpoints. run Run a pipeline through the CLI serve CLI tool to run inference requests through REST and GraphQL endpoints.
options: -h, --help show this help message and exit
- โ ONNX INT8 quantized for fast inference on CPU/CoreML
Label mapping
| Label | Meaning |
|---|---|
| 0 | Clean โ allow |
| 1 | Toxic โ block/flag |
- Downloads last month
- 59