Email Classifier (MiniLM ONNX)

A dual-head MiniLM classifier for email category + action prediction, optimized for on-device inference using ONNX Runtime.

Model Description

Classifies emails into 5 categories and predicts whether action is required:

Category Description
PERSONAL 1:1 human communication, social messages
NEWSLETTER Marketing, promotions, subscribed content
TRANSACTION Orders, receipts, payments, confirmations
ALERT Security notices, important notifications
SOCIAL Social network notifications, community updates

Output Format

Single forward pass producing two tensors:

  • category_probs: Float32[5] β€” softmax probabilities per category (argmax = predicted category)
  • action_prob: Float32[1] β€” sigmoid probability of action required (threshold 0.5)

No text generation, no decoder, no beam search.

Example:

Input: "Subject: Your order has shipped\n\nBody: Your order #12345 is on its way..."
Output: category_probs β†’ TRANSACTION (0.94), action_prob β†’ 0.12 (NO_ACTION)

Intended Use

  • Primary: On-device email triage in mobile apps (iOS/Android)
  • Runtime: ONNX Runtime React Native
  • Use case: Prioritizing inbox, filtering noise, surfacing actionable emails

Model Details

Attribute Value
Base Model microsoft/Multilingual-MiniLM-L12-H384
Parameters ~117M
Architecture XLM-R encoder + dual classification heads
ONNX Size 113 MB (INT8 quantized)
Max Sequence 256 tokens
Tokenizer SentencePiece BPE (250K vocab)

Performance

Metric Score
Category Accuracy 92.0%
Action Accuracy 82.8%
Quantization INT8 dynamic (4x compression)

Training Data

  • Source: Personal Gmail inboxes (anonymized)
  • Languages: English, French
  • Labeling: Human-annotated with category + action flag
  • Input format: Subject: ...\n\nBody: ... (no instruction prefix)

How to Use

ONNX Runtime (React Native)

import { InferenceSession, Tensor } from 'onnxruntime-react-native';

const session = await InferenceSession.create('model.onnx');

const outputs = await session.run({
  input_ids: inputIdsTensor,
  attention_mask: attentionMaskTensor,
  token_type_ids: tokenTypeIdsTensor,  // all zeros
});

const categoryProbs = outputs.category_probs.data;  // Float32[5]
const actionProb = outputs.action_prob.data[0];      // Float32

Python (PyTorch)

from transformers import AutoTokenizer, AutoModel
import torch

# Load the base encoder + trained heads
tokenizer = AutoTokenizer.from_pretrained("Ippoboi/minilmail-classifier")
# Load DualHeadClassifier from checkpoint (see train_classifier.py)

text = "Subject: Meeting tomorrow\n\nBody: Can we reschedule to 3pm?"
inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True)

with torch.no_grad():
    cat_logits, act_logits = model(inputs["input_ids"], inputs["attention_mask"])
    category = ["ALERT", "NEWSLETTER", "PERSONAL", "SOCIAL", "TRANSACTION"][cat_logits.argmax()]
    action = torch.sigmoid(act_logits).item() > 0.5

Files

File Size Description
model.onnx 113 MB INT8 quantized ONNX model
tokenizer.json 17 MB SentencePiece BPE tokenizer (XLM-R vocab)

Architecture

Input β†’ XLM-R Encoder (12 layers, 384 hidden) β†’ [CLS] token
                                                    ↓
                                              β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
                                              ↓           ↓
                                        Category Head  Action Head
                                        Linear(384β†’5)  Linear(384β†’1)
                                              ↓           ↓
                                          softmax      sigmoid

Compared to Previous Model (FLAN-T5)

FLAN-T5 (v2) MiniLM (this)
Download size 376 MB (3 files) 130 MB (2 files)
Inference Encoder + decoder beam search Single forward pass
Output Generated text Probabilities
Summaries Yes No (uses Gmail snippets)
Latency ~300ms+ (multiple decoder calls) ~30-50ms (single call)

Limitations

  • Trained primarily on English/French emails
  • May not generalize well to enterprise/corporate email patterns
  • Classification accuracy depends on email content quality (plain text preferred over HTML-heavy)
  • 250K vocab tokenizer is oversized for this use case (XLM-R covers 100+ languages)

License

Apache 2.0

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Ippoboi/minilmail-classifier

Quantized
(3)
this model