Mitchins's picture
Upload folder using huggingface_hub
aa06200 verified
---
language: en
license: mit
tags:
- text-classification
- toxicity-detection
- cnn
- byte-level
- ultra-lightweight
- cloudflare-optimized
datasets:
- balanced_toxicity_222k
metrics:
- f1
- precision
- recall
model_type: cnn
inference: true
---
# ByteCNN-10K: Ultra-Lightweight Toxicity Detection
Ultra-compact single-layer CNN for toxicity detection, optimized for maximum speed on edge deployment.
## Model Details
- **Architecture**: Single-layer ByteCNN (Embedding β†’ Conv1D + BatchNorm β†’ Dense β†’ Dense)
- **Parameters**: 10,009 (~40KB)
- **Input**: Raw byte sequences (max 512 bytes)
- **Output**: Toxicity probability (0-1)
- **Optimization**: 72% parameter reduction from original 36K model
## Performance
- **Validation Accuracy**: 78.97%
- **Training Dataset**: Full balanced dataset (222,628 samples)
- **Efficiency**: 7.89% accuracy per 1K parameters (best efficiency in sweep)
- **Inference Speed**: <1ms on Cloudflare Workers
- **CPU Limits**: Guaranteed to stay under edge compute constraints
## Architecture Configuration
- **Embedding**: 256 vocab β†’ 12 dimensions
- **Conv Layer**: 12 β†’ 40 filters, kernel=3
- **Dense Layer**: 40 β†’ 128 hidden units
- **Output**: 128 β†’ 1 (sigmoid activation)
## Training Details
- Trained on balanced dataset (50/50 toxic/safe ratio)
- 222,628 total samples from multiple sources
- AdamW optimizer with weight decay 0.01
- Learning rate: 0.001 with ReduceLROnPlateau
- 10 epochs, batch size 128
## Parameter Sweep Results
Comparison across different model sizes:
| Model | Parameters | Accuracy | Efficiency (Acc/1K params) | Trade-off |
|-------|------------|----------|---------------------------|-----------|
| Original | 36,257 | 81.20% | 2.24% | Baseline |
| **10K** | **10,009** | **78.97%** | **7.89%** | **72% fewer params** |
| 15K | 14,985 | 80.98% | 5.40% | 59% fewer params |
| 20K | 20,009 | 78.76% | 3.94% | 45% fewer params |
The 10K model offers the best parameter efficiency with minimal accuracy loss.
## Contextual Understanding
Despite its compact size, the model demonstrates sophisticated toxicity detection:
- **"fuck you"** β†’ 87.28% toxic (direct personal attack)
- **"get fucked!"** β†’ 17.73% safe (potentially playful/dismissive)
- **"Hello everyone!"** β†’ 6.65% safe (clearly benign)
## Usage
Input text is converted to UTF-8 bytes, truncated/padded to 512 bytes, then processed through the CNN layers.
## Deployment
Optimized for edge deployment with:
- BatchNorm folding for inference
- Static weight embedding (211KB)
- Sub-1ms inference time
- Zero CPU timeout issues on Cloudflare Workers
## Live Demo
- **API**: https://bytecnn-demo.mitch-336.workers.dev/api/classify
- **Status**: https://bytecnn-demo.mitch-336.workers.dev/api/status
## Limitations
- English text optimized
- 512 byte context window
- Binary classification only (toxic/safe)
- 2.23% accuracy trade-off vs original for 72% size reduction
## Model Card
This model represents the optimal balance of speed and quality for production edge deployment.