Ippoboi's picture
Upload README.md with huggingface_hub
2544f82 verified
metadata
base_model:
  - microsoft/Multilingual-MiniLM-L12-H384
license: apache-2.0
language:
  - en
  - fr
pipeline_tag: text-classification
inference: false
tags:
  - classification
  - emails
  - onnx
  - mobile
  - int8
widget:
  - text: |-
      Subject: Your order has shipped

      Body: Your order #12345 is on its way and will arrive by Monday.
    example_title: Transaction
  - text: |-
      Subject: Meeting tomorrow

      Body: Hey, can we reschedule our 2pm meeting to 3pm? Let me know.
    example_title: Personal
  - text: |-
      Subject: Weekly Newsletter

      Body: Check out our latest deals! 50% off everything this weekend.
    example_title: Newsletter
  - text: |-
      Subject: Security Alert

      Body: A new device logged into your account from San Francisco, CA.
    example_title: Alert

Email Classifier (MiniLM ONNX)

A dual-head MiniLM classifier for email category + action prediction, optimized for on-device inference using ONNX Runtime.

Model Description

Classifies emails into 5 categories and predicts whether action is required:

Category Description
PERSONAL 1:1 human communication, social messages
NEWSLETTER Marketing, promotions, subscribed content
TRANSACTION Orders, receipts, payments, confirmations
ALERT Security notices, important notifications
SOCIAL Social network notifications, community updates

Output Format

Single forward pass producing two tensors:

  • category_probs: Float32[5] β€” softmax probabilities per category (argmax = predicted category)
  • action_prob: Float32[1] β€” sigmoid probability of action required (threshold 0.5)

No text generation, no decoder, no beam search.

Example:

Input: "Subject: Your order has shipped\n\nBody: Your order #12345 is on its way..."
Output: category_probs β†’ TRANSACTION (0.94), action_prob β†’ 0.12 (NO_ACTION)

Intended Use

  • Primary: On-device email triage in mobile apps (iOS/Android)
  • Runtime: ONNX Runtime React Native
  • Use case: Prioritizing inbox, filtering noise, surfacing actionable emails

Model Details

Attribute Value
Base Model microsoft/Multilingual-MiniLM-L12-H384
Parameters ~117M
Architecture XLM-R encoder + dual classification heads
ONNX Size 113 MB (INT8 quantized)
Max Sequence 256 tokens
Tokenizer SentencePiece BPE (250K vocab)

Performance

Metric Score
Category Accuracy 92.0%
Action Accuracy 82.8%
Quantization INT8 dynamic (4x compression)

Training Data

  • Source: Personal Gmail inboxes (anonymized)
  • Languages: English, French
  • Labeling: Human-annotated with category + action flag
  • Input format: Subject: ...\n\nBody: ... (no instruction prefix)

How to Use

ONNX Runtime (React Native)

import { InferenceSession, Tensor } from 'onnxruntime-react-native';

const session = await InferenceSession.create('model.onnx');

const outputs = await session.run({
  input_ids: inputIdsTensor,
  attention_mask: attentionMaskTensor,
  token_type_ids: tokenTypeIdsTensor,  // all zeros
});

const categoryProbs = outputs.category_probs.data;  // Float32[5]
const actionProb = outputs.action_prob.data[0];      // Float32

Python (PyTorch)

from transformers import AutoTokenizer, AutoModel
import torch

# Load the base encoder + trained heads
tokenizer = AutoTokenizer.from_pretrained("Ippoboi/minilmail-classifier")
# Load DualHeadClassifier from checkpoint (see train_classifier.py)

text = "Subject: Meeting tomorrow\n\nBody: Can we reschedule to 3pm?"
inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True)

with torch.no_grad():
    cat_logits, act_logits = model(inputs["input_ids"], inputs["attention_mask"])
    category = ["ALERT", "NEWSLETTER", "PERSONAL", "SOCIAL", "TRANSACTION"][cat_logits.argmax()]
    action = torch.sigmoid(act_logits).item() > 0.5

Files

File Size Description
model.onnx 113 MB INT8 quantized ONNX model
tokenizer.json 17 MB SentencePiece BPE tokenizer (XLM-R vocab)

Architecture

Input β†’ XLM-R Encoder (12 layers, 384 hidden) β†’ [CLS] token
                                                    ↓
                                              β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
                                              ↓           ↓
                                        Category Head  Action Head
                                        Linear(384β†’5)  Linear(384β†’1)
                                              ↓           ↓
                                          softmax      sigmoid

Compared to Previous Model (FLAN-T5)

FLAN-T5 (v2) MiniLM (this)
Download size 376 MB (3 files) 130 MB (2 files)
Inference Encoder + decoder beam search Single forward pass
Output Generated text Probabilities
Summaries Yes No (uses Gmail snippets)
Latency ~300ms+ (multiple decoder calls) ~30-50ms (single call)

Limitations

  • Trained primarily on English/French emails
  • May not generalize well to enterprise/corporate email patterns
  • Classification accuracy depends on email content quality (plain text preferred over HTML-heavy)
  • 250K vocab tokenizer is oversized for this use case (XLM-R covers 100+ languages)

License

Apache 2.0