DistilBERT Intent Classifier (ONNX INT8 Quantized)

Model Description

This is an INT8 quantized ONNX version of a fine-tuned DistilBERT model for 6-class intent classification. Optimized for browser-based inference with ONNX Runtime Web.

Intent Categories

  1. arithmetic - Simple calculations, unit conversions, percentage math
  2. symbolic_reasoning - Mathematical proofs, physics derivations, complex logic
  3. factual_lookup - Search-engine style questions (history, geography, etc.)
  4. creative_synthesis - Fiction writing, poetry, roleplay, brainstorming
  5. code_generation - Writing, refactoring, or auditing code
  6. security_risk - PII leaks, API keys, injection attempts

Model Details

  • Base Model: distilbert-base-uncased
  • Quantization: INT8 Dynamic Quantization (AVX512-VNNI)
  • Model Size: 65.15 MB (75% smaller load time vs. unquantized)
  • Format: ONNX
  • Target Platform: Browser (ONNX Runtime Web) / CPU inference

Performance

  • Load Time: ~75% faster than unquantized PyTorch model
  • Inference Speed: 2-3x faster on CPU
  • Accuracy: >85% on arithmetic and search (factual_lookup) intents

Usage

Browser (ONNX Runtime Web)

import * as ort from 'onnxruntime-web';

// Load model
const session = await ort.InferenceSession.create('model.onnx');

// Tokenize input (you'll need to implement tokenization in JS)
const inputIds = tokenize("What is 25 + 17?"); // Returns Int64Array

// Run inference
const feeds = {
  input_ids: new ort.Tensor('int64', inputIds, [1, inputIds.length]),
  attention_mask: new ort.Tensor('int64', attentionMask, [1, attentionMask.length])
};

const results = await session.run(feeds);
const logits = results.logits.data;

Python (Optimum)

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("RunsOnBacon/distilbert-intent-classifier-onnx-int8")
model = ORTModelForSequenceClassification.from_pretrained(
    "RunsOnBacon/distilbert-intent-classifier-onnx-int8",
    provider="CPUExecutionProvider"
)

inputs = tokenizer("What is 25 + 17?", return_tensors="pt")
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(-1).item()

Training Data

  • Dataset: 1,200 synthetic examples (200 per category)
  • Special Feature: 30% of arithmetic and factual_lookup examples are "cloaked" with sophisticated prompt structures to teach robustness
  • Split: 80/20 train/validation

Limitations

  • Trained on synthetic data - may not generalize to all real-world variations
  • Optimized for English language only
  • Security risk detection is pattern-based, not comprehensive

Model Card Contact

Downloads last month
116
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results