Email Classifier (mALBERT ONNX)

A dual-head mALBERT classifier for email category + action prediction, optimized for on-device inference using ONNX Runtime. Bilingual (English + French), 24M parameters, 50.7 MB after INT8 quantization.

Model Description

Classifies emails into 5 categories and predicts whether the recipient should take action:

Category	Description
PERSONAL	Direct 1:1 human communication, calendar invites from real people, direct messages. Excludes platform notifications.
NEWSLETTER	Marketing, promotions, subscribed content. Includes weekly digests, year-in-review recaps, marketing-flavored surveys with rewards.
TRANSACTION	Money or order events: receipts, charges, refunds, shipping confirmations with order/booking IDs, payslips, money-transfer notifications.
ALERT	Account, security, or infrastructure messages: password resets, login alerts, CI failures, booking-bound expiry, satisfaction surveys without rewards, named-product update notifications.
SOCIAL	Platform activity between people: post mentions, comment notifications, PR review requests from real users. Excludes automated platform mail (those are ALERT).

The action flag is true only when the email requires a concrete response tied to something the user owns or initiated — pay to keep an existing booking, verify a code you requested, accept/decline a calendar invite, reply to a 1:1 message, security event needing verification, or a support ticket follow-up.

Output Format

Single forward pass producing two tensors:

category_probs: Float32[5] — softmax probabilities per category (argmax = predicted category)
action_prob: Float32[1] — sigmoid probability of action required (threshold 0.5)

No text generation, no decoder, no beam search.

Example:

Input: "Subject: Your order has shipped\n\nBody: Your order #12345 is on its way..."
Output: category_probs → TRANSACTION (0.94), action_prob → 0.08 (NO_ACTION)

Intended Use

Primary: On-device email triage in mobile apps (iOS/Android)
Runtime: ONNX Runtime React Native
Use case: Prioritizing inbox, filtering noise, surfacing actionable emails

Model Details

Attribute	Value
Base Model	`cservan/malbert-base-cased-128k`
Parameters	~24M
Architecture	ALBERT encoder (parameter-shared, 1 physical block × 12 virtual layers) + dual classification heads
Pooling	`pooler_output` (SOP-pretrained linear + tanh)
ONNX Size	50.7 MB (INT8 quantized, 1.8× compression from FP32)
Max Sequence	384 tokens
Tokenizer	SentencePiece Unigram (128K vocab, French-aware)
Hidden Size	768
Special Tokens	`[CLS]=2`, `[SEP]=3`, `<pad>=0`, `<unk>=1`

Performance

Test set metrics (250 emails, balanced across categories, EN+FR):

Metric	Score
Category Accuracy	86.0% (single seed) / 88.4% (2-seed soft-vote ensemble)
Action Accuracy	84.8%
Quantization	INT8 dynamic, 20/20 PyTorch↔ONNX argmax parity

Per-language breakdown (single seed)

	English	French
Category accuracy	85.4%	87.0%
Action accuracy	89.2%	77.2%

Notable: French slightly outperforms English on category — the multilingual signal is symmetric. Action accuracy retains an EN advantage (~12 pts) reflecting heavier representation of EN action patterns in training data.

Per-class F1 (single seed)

Class	Precision	Recall	F1
ALERT	0.885	0.900	0.893
NEWSLETTER	0.771	0.900	0.831
PERSONAL	0.917	0.892	0.904
SOCIAL	0.862	0.758	0.807
TRANSACTION	0.907	0.817	0.860

Training Data

Source: Personal Gmail inboxes (anonymized)
Languages: English, French
Size: 2,005 train / 251 val / 250 test (balanced)
Labeling: Human-annotated with category + action flag, prompt-assisted with v7 labeling rules (precise tie-breakers for booking-bound deadlines, marketing recaps with reward language, CI/security automation, curated personalized outreach, satisfaction surveys with/without incentives)
Input format: Subject: ...\n\nBody: ... (no instruction prefix)

How to Use

ONNX Runtime (React Native)

import { InferenceSession, Tensor } from 'onnxruntime-react-native';

const session = await InferenceSession.create('model.onnx');

const outputs = await session.run({
  input_ids: inputIdsTensor,           // int64[1, seq_len]
  attention_mask: attentionMaskTensor, // int64[1, seq_len]
  token_type_ids: tokenTypeIdsTensor,  // int64[1, seq_len], all zeros
});

const categoryProbs = outputs.category_probs.data;  // Float32[5]
const actionProb = outputs.action_prob.data[0];      // Float32

Python (PyTorch reference)

from transformers import AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("Ippoboi/malbert-email-classifier")
# Load DualHeadClassifier from checkpoint (see ml/scripts/train_classifier.py)

text = "Subject: Réunion demain\n\nBody: Peut-on reporter à 15h ?"
inputs = tokenizer(text, return_tensors="pt", max_length=384, truncation=True)

with torch.no_grad():
    cat_logits, act_logits = model(inputs["input_ids"], inputs["attention_mask"])
    category = ["ALERT", "NEWSLETTER", "PERSONAL", "SOCIAL", "TRANSACTION"][cat_logits.argmax()]
    action = torch.sigmoid(act_logits).item() > 0.5

ONNX Runtime (Python)

import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np

session = ort.InferenceSession("model.onnx")
tokenizer = AutoTokenizer.from_pretrained("Ippoboi/malbert-email-classifier")

inputs = tokenizer(
    "Subject: Your order has shipped\n\nBody: ...",
    return_tensors="np",
    max_length=384,
    truncation=True,
    padding="max_length",
)
cat_probs, act_prob = session.run(
    ["category_probs", "action_prob"],
    {
        "input_ids": inputs["input_ids"].astype(np.int64),
        "attention_mask": inputs["attention_mask"].astype(np.int64),
        "token_type_ids": np.zeros_like(inputs["input_ids"], dtype=np.int64),
    },
)
categories = ["ALERT", "NEWSLETTER", "PERSONAL", "SOCIAL", "TRANSACTION"]
print(categories[cat_probs[0].argmax()], "action:", act_prob[0] > 0.5)

Files

File	Size	Description
`model.onnx`	50.7 MB	INT8 quantized ONNX model
`tokenizer.json`	8.2 MB	Fast tokenizer (SentencePiece Unigram, 128K vocab)
`spiece.model`	2.3 MB	Raw SentencePiece vocab (optional, for Python reload)
`tokenizer_config.json`	1.4 KB	Tokenizer config
`special_tokens_map.json`	970 B	Special token names → IDs

Architecture

Input → ALBERT Encoder (12 virtual layers × 1 shared block, hidden=768)
                            ↓
                     pooler_output (Linear+tanh on [CLS])
                            ↓
                      ┌─────┴─────┐
                      ↓           ↓
                Category Head  Action Head
                Linear(768→5)  Linear(768→1)
                      ↓           ↓
                   softmax     sigmoid
                      ↓           ↓
              category_probs  action_prob

ALBERT shares one physical transformer block across all 12 virtual layers. This gives ~24M total parameters (vs ~110M for an equivalent BERT-base) at the cost of representational capacity per virtual depth.

Compared to Previous Model (MiniLM v1)

	MiniLM v1	mALBERT v3 (this)
Base architecture	XLM-R encoder, independent layers	ALBERT, parameter-shared
Parameters	~117M	~24M
ONNX size	113 MB	50.7 MB
Max sequence	256	384
Vocab size	250K	128K
Category accuracy	92.0%	86.0% / 88.4% (ensemble)
Action accuracy	82.8%	84.8%
FR cat parity	EN-favored	EN/FR symmetric

mALBERT v3 trades raw category accuracy for less than half the on-device footprint, wider context (384 vs 256 tokens), and balanced multilingual performance. Action accuracy is higher; category accuracy is lower in absolute terms but the language gap closes.

Limitations

Trained on personal email patterns; may not generalize to enterprise/corporate email styles
Classification accuracy depends on text quality (plain text preferred over heavy HTML)
French action accuracy lags English by ~12 points; the v7 labeling prompt is EN-leaning in its action examples
SOCIAL is the weakest category (F1 0.81) — smallest training class (268 examples) and shares features with NEWSLETTER for platform-mass-emails
384-token cap may truncate long emails; ~17% of training emails exceeded this limit
ALBERT parameter sharing limits representational depth; for harder boundaries, a non-shared encoder (mDeBERTa-v3-base, MiniLM-L12) would have more capacity at higher inference cost

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Ippoboi/malbert-email-classifier

Base model

cservan/multilingual-albert-base-cased-128k

Quantized

(1)

this model