_model_1 / README.md
AlazarM's picture
Upload README.md with huggingface_hub
ed7e9c8 verified
metadata
license: mit
language: en
tags:
  - text-classification
  - toxicity
  - moderation
  - chat
  - bert
  - pytorch
  - onnx
datasets:
  - dormlab/chat-corpus
metrics:
  - accuracy
  - f1
  - precision
  - recall
pipeline_tag: text-classification

Toxic Chat Moderation

Binary classifier for real-time chat moderation. Flags toxic, hateful, harassing, sexually explicit, and otherwise inappropriate messages in gaming and social chat.

Based on fine-tuned on 300K labeled chat messages.

Quick use

Performance

Metric Score
Accuracy 0.9768
F1 0.9768
Precision 0.9643
Recall 0.9897

ONNX INT8 latency: ~1-3ms on Apple Silicon (CoreML/MPS).

Training

  • Architecture: bert-base-uncased (110M params), 2 labels (clean/toxic)
  • Hardware: Apple Silicon Mac Mini (MPS), single-node
  • Data: 153K messages (122,688 train / 15,336 val / 15,336 test)
  • Framework: PyTorch, HuggingFace Trainer
  • Export: ONNX dynamic INT8 quantization (105 MB)

Variants

This repo provides two model formats:

  • — full PyTorch weights for use with usage: transformers []

positional arguments: {chat,convert,download,env,run,serve,add-new-model-like,add-fast-image-processor} transformers command helpers convert CLI tool to run convert model from original author checkpoints to Transformers PyTorch checkpoints. run Run a pipeline through the CLI serve CLI tool to run inference requests through REST and GraphQL endpoints.

options: -h, --help show this help message and exit

  • — ONNX INT8 quantized for fast inference on CPU/CoreML

Label mapping

Label Meaning
0 Clean — allow
1 Toxic — block/flag