Mitchins
/

bytecnn-toxicity-classifier

Text Classification

toxicity-detection

ultra-lightweight

cloudflare-optimized

Model card Files Files and versions

bytecnn-toxicity-classifier / README.md

Mitchins's picture

Upload folder using huggingface_hub

aa06200 verified 4 months ago

|

history blame contribute delete

3.04 kB

	---
	language: en
	license: mit
	tags:
	- text-classification
	- toxicity-detection
	- cnn
	- byte-level
	- ultra-lightweight
	- cloudflare-optimized
	datasets:
	- balanced_toxicity_222k
	metrics:
	- f1
	- precision
	- recall
	model_type: cnn
	inference: true
	---

	# ByteCNN-10K: Ultra-Lightweight Toxicity Detection

	Ultra-compact single-layer CNN for toxicity detection, optimized for maximum speed on edge deployment.

	## Model Details

	- Architecture: Single-layer ByteCNN (Embedding → Conv1D + BatchNorm → Dense → Dense)
	- Parameters: 10,009 (~40KB)
	- Input: Raw byte sequences (max 512 bytes)
	- Output: Toxicity probability (0-1)
	- Optimization: 72% parameter reduction from original 36K model

	## Performance

	- Validation Accuracy: 78.97%
	- Training Dataset: Full balanced dataset (222,628 samples)
	- Efficiency: 7.89% accuracy per 1K parameters (best efficiency in sweep)
	- Inference Speed: <1ms on Cloudflare Workers
	- CPU Limits: Guaranteed to stay under edge compute constraints

	## Architecture Configuration

	- Embedding: 256 vocab → 12 dimensions
	- Conv Layer: 12 → 40 filters, kernel=3
	- Dense Layer: 40 → 128 hidden units
	- Output: 128 → 1 (sigmoid activation)

	## Training Details

	- Trained on balanced dataset (50/50 toxic/safe ratio)
	- 222,628 total samples from multiple sources
	- AdamW optimizer with weight decay 0.01
	- Learning rate: 0.001 with ReduceLROnPlateau
	- 10 epochs, batch size 128

	## Parameter Sweep Results

	Comparison across different model sizes:

	\| Model \| Parameters \| Accuracy \| Efficiency (Acc/1K params) \| Trade-off \|
	\|-------\|------------\|----------\|---------------------------\|-----------\|
	\| Original \| 36,257 \| 81.20% \| 2.24% \| Baseline \|
	\| 10K \| 10,009 \| 78.97% \| 7.89% \| 72% fewer params \|
	\| 15K \| 14,985 \| 80.98% \| 5.40% \| 59% fewer params \|
	\| 20K \| 20,009 \| 78.76% \| 3.94% \| 45% fewer params \|

	The 10K model offers the best parameter efficiency with minimal accuracy loss.

	## Contextual Understanding

	Despite its compact size, the model demonstrates sophisticated toxicity detection:

	- "fuck you" → 87.28% toxic (direct personal attack)
	- "get fucked!" → 17.73% safe (potentially playful/dismissive)
	- "Hello everyone!" → 6.65% safe (clearly benign)

	## Usage

	Input text is converted to UTF-8 bytes, truncated/padded to 512 bytes, then processed through the CNN layers.

	## Deployment

	Optimized for edge deployment with:
	- BatchNorm folding for inference
	- Static weight embedding (211KB)
	- Sub-1ms inference time
	- Zero CPU timeout issues on Cloudflare Workers

	## Live Demo

	- API: https://bytecnn-demo.mitch-336.workers.dev/api/classify
	- Status: https://bytecnn-demo.mitch-336.workers.dev/api/status

	## Limitations

	- English text optimized
	- 512 byte context window
	- Binary classification only (toxic/safe)
	- 2.23% accuracy trade-off vs original for 72% size reduction

	## Model Card

	This model represents the optimal balance of speed and quality for production edge deployment.