DistilBERT Intent Classifier (ONNX INT8 Quantized)
Model Description
This is an INT8 quantized ONNX version of a fine-tuned DistilBERT model for 6-class intent classification. Optimized for browser-based inference with ONNX Runtime Web.
Intent Categories
- arithmetic - Simple calculations, unit conversions, percentage math
- symbolic_reasoning - Mathematical proofs, physics derivations, complex logic
- factual_lookup - Search-engine style questions (history, geography, etc.)
- creative_synthesis - Fiction writing, poetry, roleplay, brainstorming
- code_generation - Writing, refactoring, or auditing code
- security_risk - PII leaks, API keys, injection attempts
Model Details
- Base Model: distilbert-base-uncased
- Quantization: INT8 Dynamic Quantization (AVX512-VNNI)
- Model Size: 65.15 MB (75% smaller load time vs. unquantized)
- Format: ONNX
- Target Platform: Browser (ONNX Runtime Web) / CPU inference
Performance
- Load Time: ~75% faster than unquantized PyTorch model
- Inference Speed: 2-3x faster on CPU
- Accuracy: >85% on arithmetic and search (factual_lookup) intents
Usage
Browser (ONNX Runtime Web)
import * as ort from 'onnxruntime-web';
// Load model
const session = await ort.InferenceSession.create('model.onnx');
// Tokenize input (you'll need to implement tokenization in JS)
const inputIds = tokenize("What is 25 + 17?"); // Returns Int64Array
// Run inference
const feeds = {
input_ids: new ort.Tensor('int64', inputIds, [1, inputIds.length]),
attention_mask: new ort.Tensor('int64', attentionMask, [1, attentionMask.length])
};
const results = await session.run(feeds);
const logits = results.logits.data;
Python (Optimum)
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("RunsOnBacon/distilbert-intent-classifier-onnx-int8")
model = ORTModelForSequenceClassification.from_pretrained(
"RunsOnBacon/distilbert-intent-classifier-onnx-int8",
provider="CPUExecutionProvider"
)
inputs = tokenizer("What is 25 + 17?", return_tensors="pt")
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1).item()
Training Data
- Dataset: 1,200 synthetic examples (200 per category)
- Special Feature: 30% of arithmetic and factual_lookup examples are "cloaked" with sophisticated prompt structures to teach robustness
- Split: 80/20 train/validation
Limitations
- Trained on synthetic data - may not generalize to all real-world variations
- Optimized for English language only
- Security risk detection is pattern-based, not comprehensive
Model Card Contact
- Repository: https://huggingface.co/RunsOnBacon/distilbert-intent-classifier-onnx-int8
- Original PyTorch Model: https://huggingface.co/RunsOnBacon/distilbert-intent-prompt-classifier
- Downloads last month
- 116
Evaluation results
- Accuracyself-reported0.850