| --- |
| license: mit |
| language: en |
| tags: |
| - text-classification |
| - toxicity |
| - moderation |
| - chat |
| - bert |
| - pytorch |
| - onnx |
| datasets: |
| - dormlab/chat-corpus |
| metrics: |
| - accuracy |
| - f1 |
| - precision |
| - recall |
| pipeline_tag: text-classification |
| --- |
| |
| # Toxic Chat Moderation |
|
|
| Binary classifier for real-time chat moderation. Flags toxic, hateful, harassing, |
| sexually explicit, and otherwise inappropriate messages in gaming and social chat. |
|
|
| Based on fine-tuned on 300K labeled chat messages. |
|
|
| ## Quick use |
|
|
|
|
|
|
| ## Performance |
|
|
| | Metric | Score | |
| |--------|-------| |
| | Accuracy | 0.9768 | |
| | F1 | 0.9768 | |
| | Precision | 0.9643 | |
| | Recall | 0.9897 | |
|
|
| ONNX INT8 latency: ~1-3ms on Apple Silicon (CoreML/MPS). |
|
|
| ## Training |
|
|
| - **Architecture**: bert-base-uncased (110M params), 2 labels (clean/toxic) |
| - **Hardware**: Apple Silicon Mac Mini (MPS), single-node |
| - **Data**: 153K messages (122,688 train / 15,336 val / 15,336 test) |
| - **Framework**: PyTorch, HuggingFace Trainer |
| - **Export**: ONNX dynamic INT8 quantization (105 MB) |
|
|
| ## Variants |
|
|
| This repo provides two model formats: |
| - — full PyTorch weights for use with usage: transformers <command> [<args>] |
|
|
| positional arguments: |
| {chat,convert,download,env,run,serve,add-new-model-like,add-fast-image-processor} |
| transformers command helpers |
| convert CLI tool to run convert model from original author |
| checkpoints to Transformers PyTorch checkpoints. |
| run Run a pipeline through the CLI |
| serve CLI tool to run inference requests through REST and |
| GraphQL endpoints. |
| |
| options: |
| -h, --help show this help message and exit |
| - — ONNX INT8 quantized for fast inference on CPU/CoreML |
|
|
| ## Label mapping |
|
|
| | Label | Meaning | |
| |--------|---------| |
| | 0 | Clean — allow | |
| | 1 | Toxic — block/flag | |
|
|