Email Classifier (mALBERT ONNX)

A dual-head mALBERT classifier for email category + action prediction, optimized for on-device inference using ONNX Runtime. Bilingual (English + French), 24M parameters, 50.7 MB after INT8 quantization.

Model Description

Classifies emails into 5 categories and predicts whether the recipient should take action:

Category Description
PERSONAL Direct 1:1 human communication, calendar invites from real people, direct messages. Excludes platform notifications.
NEWSLETTER Marketing, promotions, subscribed content. Includes weekly digests, year-in-review recaps, marketing-flavored surveys with rewards.
TRANSACTION Money or order events: receipts, charges, refunds, shipping confirmations with order/booking IDs, payslips, money-transfer notifications.
ALERT Account, security, or infrastructure messages: password resets, login alerts, CI failures, booking-bound expiry, satisfaction surveys without rewards, named-product update notifications.
SOCIAL Platform activity between people: post mentions, comment notifications, PR review requests from real users. Excludes automated platform mail (those are ALERT).

The action flag is true only when the email requires a concrete response tied to something the user owns or initiated — pay to keep an existing booking, verify a code you requested, accept/decline a calendar invite, reply to a 1:1 message, security event needing verification, or a support ticket follow-up.

Output Format

Single forward pass producing two tensors:

  • category_probs: Float32[5] — softmax probabilities per category (argmax = predicted category)
  • action_prob: Float32[1] — sigmoid probability of action required (threshold 0.5)

No text generation, no decoder, no beam search.

Example:

Input: "Subject: Your order has shipped\n\nBody: Your order #12345 is on its way..."
Output: category_probs → TRANSACTION (0.94), action_prob → 0.08 (NO_ACTION)

Intended Use

  • Primary: On-device email triage in mobile apps (iOS/Android)
  • Runtime: ONNX Runtime React Native
  • Use case: Prioritizing inbox, filtering noise, surfacing actionable emails

Model Details

Attribute Value
Base Model cservan/malbert-base-cased-128k
Parameters ~24M
Architecture ALBERT encoder (parameter-shared, 1 physical block × 12 virtual layers) + dual classification heads
Pooling pooler_output (SOP-pretrained linear + tanh)
ONNX Size 50.7 MB (INT8 quantized, 1.8× compression from FP32)
Max Sequence 384 tokens
Tokenizer SentencePiece Unigram (128K vocab, French-aware)
Hidden Size 768
Special Tokens [CLS]=2, [SEP]=3, <pad>=0, <unk>=1

Performance

Test set metrics (250 emails, balanced across categories, EN+FR):

Metric Score
Category Accuracy 86.0% (single seed) / 88.4% (2-seed soft-vote ensemble)
Action Accuracy 84.8%
Quantization INT8 dynamic, 20/20 PyTorch↔ONNX argmax parity

Per-language breakdown (single seed)

English French
Category accuracy 85.4% 87.0%
Action accuracy 89.2% 77.2%

Notable: French slightly outperforms English on category — the multilingual signal is symmetric. Action accuracy retains an EN advantage (~12 pts) reflecting heavier representation of EN action patterns in training data.

Per-class F1 (single seed)

Class Precision Recall F1
ALERT 0.885 0.900 0.893
NEWSLETTER 0.771 0.900 0.831
PERSONAL 0.917 0.892 0.904
SOCIAL 0.862 0.758 0.807
TRANSACTION 0.907 0.817 0.860

Training Data

  • Source: Personal Gmail inboxes (anonymized)
  • Languages: English, French
  • Size: 2,005 train / 251 val / 250 test (balanced)
  • Labeling: Human-annotated with category + action flag, prompt-assisted with v7 labeling rules (precise tie-breakers for booking-bound deadlines, marketing recaps with reward language, CI/security automation, curated personalized outreach, satisfaction surveys with/without incentives)
  • Input format: Subject: ...\n\nBody: ... (no instruction prefix)

How to Use

ONNX Runtime (React Native)

import { InferenceSession, Tensor } from 'onnxruntime-react-native';

const session = await InferenceSession.create('model.onnx');

const outputs = await session.run({
  input_ids: inputIdsTensor,           // int64[1, seq_len]
  attention_mask: attentionMaskTensor, // int64[1, seq_len]
  token_type_ids: tokenTypeIdsTensor,  // int64[1, seq_len], all zeros
});

const categoryProbs = outputs.category_probs.data;  // Float32[5]
const actionProb = outputs.action_prob.data[0];      // Float32

Python (PyTorch reference)

from transformers import AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("Ippoboi/malbert-email-classifier")
# Load DualHeadClassifier from checkpoint (see ml/scripts/train_classifier.py)

text = "Subject: Réunion demain\n\nBody: Peut-on reporter à 15h ?"
inputs = tokenizer(text, return_tensors="pt", max_length=384, truncation=True)

with torch.no_grad():
    cat_logits, act_logits = model(inputs["input_ids"], inputs["attention_mask"])
    category = ["ALERT", "NEWSLETTER", "PERSONAL", "SOCIAL", "TRANSACTION"][cat_logits.argmax()]
    action = torch.sigmoid(act_logits).item() > 0.5

ONNX Runtime (Python)

import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np

session = ort.InferenceSession("model.onnx")
tokenizer = AutoTokenizer.from_pretrained("Ippoboi/malbert-email-classifier")

inputs = tokenizer(
    "Subject: Your order has shipped\n\nBody: ...",
    return_tensors="np",
    max_length=384,
    truncation=True,
    padding="max_length",
)
cat_probs, act_prob = session.run(
    ["category_probs", "action_prob"],
    {
        "input_ids": inputs["input_ids"].astype(np.int64),
        "attention_mask": inputs["attention_mask"].astype(np.int64),
        "token_type_ids": np.zeros_like(inputs["input_ids"], dtype=np.int64),
    },
)
categories = ["ALERT", "NEWSLETTER", "PERSONAL", "SOCIAL", "TRANSACTION"]
print(categories[cat_probs[0].argmax()], "action:", act_prob[0] > 0.5)

Files

File Size Description
model.onnx 50.7 MB INT8 quantized ONNX model
tokenizer.json 8.2 MB Fast tokenizer (SentencePiece Unigram, 128K vocab)
spiece.model 2.3 MB Raw SentencePiece vocab (optional, for Python reload)
tokenizer_config.json 1.4 KB Tokenizer config
special_tokens_map.json 970 B Special token names → IDs

Architecture

Input → ALBERT Encoder (12 virtual layers × 1 shared block, hidden=768)
                            ↓
                     pooler_output (Linear+tanh on [CLS])
                            ↓
                      ┌─────┴─────┐
                      ↓           ↓
                Category Head  Action Head
                Linear(768→5)  Linear(768→1)
                      ↓           ↓
                   softmax     sigmoid
                      ↓           ↓
              category_probs  action_prob

ALBERT shares one physical transformer block across all 12 virtual layers. This gives ~24M total parameters (vs ~110M for an equivalent BERT-base) at the cost of representational capacity per virtual depth.

Compared to Previous Model (MiniLM v1)

MiniLM v1 mALBERT v3 (this)
Base architecture XLM-R encoder, independent layers ALBERT, parameter-shared
Parameters ~117M ~24M
ONNX size 113 MB 50.7 MB
Max sequence 256 384
Vocab size 250K 128K
Category accuracy 92.0% 86.0% / 88.4% (ensemble)
Action accuracy 82.8% 84.8%
FR cat parity EN-favored EN/FR symmetric

mALBERT v3 trades raw category accuracy for less than half the on-device footprint, wider context (384 vs 256 tokens), and balanced multilingual performance. Action accuracy is higher; category accuracy is lower in absolute terms but the language gap closes.

Limitations

  • Trained on personal email patterns; may not generalize to enterprise/corporate email styles
  • Classification accuracy depends on text quality (plain text preferred over heavy HTML)
  • French action accuracy lags English by ~12 points; the v7 labeling prompt is EN-leaning in its action examples
  • SOCIAL is the weakest category (F1 0.81) — smallest training class (268 examples) and shares features with NEWSLETTER for platform-mass-emails
  • 384-token cap may truncate long emails; ~17% of training emails exceeded this limit
  • ALBERT parameter sharing limits representational depth; for harder boundaries, a non-shared encoder (mDeBERTa-v3-base, MiniLM-L12) would have more capacity at higher inference cost

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ippoboi/malbert-email-classifier

Quantized
(1)
this model