Email Classifier (mmBERT-small ONNX, v8)

A dual-head mmBERT-small classifier for multilingual email category + action prediction, optimized for on-device inference using ONNX Runtime.

Model Description

Classifies emails into 6 categories and predicts whether action is required:

Category Description
PERSONAL 1:1 human communication, social messages, direct correspondence
NEWSLETTER Subscribed editorial/digest content (curated articles, weekly roundups)
PROMOTIONAL Marketing pushes, sales, discount offers, product launches
TRANSACTION Orders, receipts, payments, shipping confirmations
ALERT Security notices, account warnings, important notifications
SOCIAL Social network notifications, community updates, reactions

The PROMOTIONAL ↔ NEWSLETTER split is new in v8. v6 (MiniLM) lumped both into NEWSLETTER, which made downstream filtering noisy.

Output Format

Single forward pass producing two tensors:

  • category_probs: Float32[6] β€” softmax probabilities per category (argmax = predicted category)
  • action_prob: Float32[1] β€” sigmoid probability of action required (threshold 0.5)

No text generation, no decoder, no beam search.

Example:

Input: "Subject: Your order has shipped\n\nBody: Your order #12345 is on its way..."
Output: category_probs β†’ TRANSACTION (0.96), action_prob β†’ 0.08 (NO_ACTION)

Intended Use

  • Primary: On-device email triage in multilingual mobile apps (iOS/Android)
  • Runtime: ONNX Runtime React Native
  • Use case: Prioritizing inbox, filtering noise, surfacing actionable emails β€” for English- and French-speaking users

Model Details

Attribute Value
Base Model jhu-clsp/mmBERT-small (ModernBERT family, multilingual)
Parameters ~140M
Architecture mmBERT encoder (RoPE + GeGLU + alternating local/global attention) + dual classification heads
Pooling Mean pooling over last_hidden_state (masked)
ONNX Size 135.3 MB (INT8 dynamic per-channel quantized)
Max Sequence 384 tokens
Tokenizer Gemma 2 BPE (256K vocab)
Opset 14

Performance

Evaluated on a held-out 321-sample multilingual test set (md5 37bbd4c08ae9338890ad5cc2656b5e6f):

Metric Score
Category Accuracy (overall) 93.15%
Category Accuracy (English) 93.49%
Category Accuracy (French) 92.45%
Action Accuracy (overall) 92.83%
Argmax-match vs PyTorch FP32 96.57%
Quantization INT8 dynamic, per-channel (4Γ— compression vs FP32)

Per-class Recall

Class n Recall
ALERT 60 85.00%
NEWSLETTER 50 94.00%
PERSONAL 50 98.00%
PROMOTIONAL 60 95.00%
SOCIAL 41 87.80%
TRANSACTION 60 98.33%

Multilingual stability: FR cat_acc holds within 1.04pp of EN, indicating the cross-lingual encoder generalizes evenly across the trained languages.

Training Data

  • Source: Personal Gmail inboxes (anonymized)
  • Languages: English, French (joint-stratified balance by category Γ— language)
  • Labeling: Human-annotated with category + action flag
  • Class weights: Gentle (max 1.317, min 0.891) β€” joint-stratified weighting prevents class collapse under quantization
  • Input format: Subject: ...\n\nBody: ... (no instruction prefix)

How to Use

ONNX Runtime (React Native)

import { InferenceSession, Tensor } from 'onnxruntime-react-native';

const session = await InferenceSession.create('model.onnx');

// Two inputs only β€” NO token_type_ids (mmBERT does not use segment embeddings)
const outputs = await session.run({
  input_ids: inputIdsTensor,         // int64[1, S], S ≀ 384
  attention_mask: attentionMaskTensor, // int64[1, S]
});

const categoryProbs = outputs.category_probs.data; // Float32[6]
const actionProb = outputs.action_prob.data[0];    // Float32

const CATEGORIES = ['ALERT', 'NEWSLETTER', 'PERSONAL', 'PROMOTIONAL', 'SOCIAL', 'TRANSACTION'];
const category = CATEGORIES[categoryProbs.indexOf(Math.max(...categoryProbs))];
const actionRequired = actionProb > 0.5;

Python (PyTorch)

from transformers import AutoTokenizer, AutoModel
import torch

tokenizer = AutoTokenizer.from_pretrained("Ippoboi/mmbert-s-email-classifier")
# Load DualHeadClassifier (mean pooling + dual heads) from checkpoint
# (see ml/scripts/train_classifier.py for the head architecture)

text = "Subject: Meeting tomorrow\n\nBody: Can we reschedule to 3pm?"
inputs = tokenizer(text, return_tensors="pt", max_length=384, truncation=True)

with torch.no_grad():
    cat_probs, act_prob = model(inputs["input_ids"], inputs["attention_mask"])
    categories = ["ALERT", "NEWSLETTER", "PERSONAL", "PROMOTIONAL", "SOCIAL", "TRANSACTION"]
    category = categories[cat_probs.argmax()]
    action = act_prob.item() > 0.5

Special Tokens (Gemma 2 BPE)

mmBERT uses Gemma 2's tokenizer β€” IDs differ from XLM-R/MiniLM:

Token ID
<pad> 0
<eos> 1
<bos> 2
<unk> 3

Sequence wrap: [<bos>, ...content..., <eos>]. There is no [CLS] / [SEP] β€” that's XLM-R territory.

Files

File Size Description
model.onnx 135.3 MB INT8 quantized ONNX model
tokenizer.json 32.8 MB Gemma 2 BPE tokenizer (256K vocab)
tokenizer_config.json 45 KB Tokenizer configuration
special_tokens_map.json 1 KB Special token IDs
export_metadata.json 1 KB Provenance + canonical metrics

Architecture

Input β†’ mmBERT Encoder (22 layers, 384 hidden, RoPE + GeGLU)
                       ↓
              Mean-pool over last_hidden_state (masked by attention_mask)
                       ↓
                 β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
                 ↓           ↓
           Category Head  Action Head
           Linear(384β†’6)  Linear(384β†’1)
                 ↓           ↓
             softmax      sigmoid

Compared to Previous Model (MiniLM v6)

MiniLM v6 mmBERT-small v8 (this)
Base XLM-R MiniLM-L12 mmBERT-small (ModernBERT family)
Schema 5 classes 6 classes (PROMOTIONAL added)
Languages tracked English-dominant English + French (balanced)
Vocab 250K SentencePiece Unigram 256K Gemma 2 BPE
Max sequence 256 384
Inputs input_ids, attention_mask, token_type_ids input_ids, attention_mask
Bundle size 113 MB 135 MB
Cat acc ~92.0% (5-class) 93.15% (6-class, harder schema)
FR cat acc not tracked 92.45%

Limitations

  • Trained on English and French only; may not generalize to other languages despite the multilingual base
  • Personal/consumer email patterns; may not generalize to enterprise/corporate email
  • The PROMOTIONAL ↔ NEWSLETTER decision boundary is genuinely fuzzy; expect some legitimate disagreements with human raters at this boundary
  • Action accuracy lost ~2.2pp under INT8 quantization vs FP32 (95.02% β†’ 92.83%) β€” the action head is a single Linear(384β†’1) and is more quantization-sensitive than the 6-class softmax
  • 256K vocab tokenizer is oversized for an EN+FR-only deployment but is required to use the pretrained mmBERT weights without retraining

Notes on Quantization

This model uses INT8 dynamic per-channel quantization via onnxruntime.quantization.quantize_dynamic(weight_type=QInt8, per_channel=True) on a clean FP32 ONNX export. Two export-time fixes were required to preserve accuracy through quantization on this architecture:

  1. The export wrapper drops the unused token_type_ids path (mmBERT has no segment embeddings; an unused embedding lookup contaminates the shared 256K-vocab embedding's scale calibration).
  2. model.encoder.config.reference_compile = False is set before torch.onnx.export(..., dynamo=False) so the legacy tracer can trace through the tok_embeddings lookup directly instead of the compiled_embeddings torch.compile shim.

With both fixes, INT8 dynamic per-channel quantization preserves the FP32 cat_acc essentially exactly. Static-INT8 calibration (percentile/entropy) was attempted but is empirically infeasible on ModernBERT-style graphs in ORT 1.24 due to peak-RAM blowup during calibration of wide-FFN and full-attention activation tensors.

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Ippoboi/mmbert-s-email-classifier

Quantized
(9)
this model