Upload README.md with huggingface_hub

2544f82 verified 27 days ago

5.43 kB

base_model:
  - microsoft/Multilingual-MiniLM-L12-H384
license: apache-2.0
language:
  - en
  - fr
pipeline_tag: text-classification
inference: false
tags:
  - classification
  - emails
  - onnx
  - mobile
  - int8
widget:
  - text: |-
      Subject: Your order has shipped

      Body: Your order #12345 is on its way and will arrive by Monday.
    example_title: Transaction
  - text: |-
      Subject: Meeting tomorrow

      Body: Hey, can we reschedule our 2pm meeting to 3pm? Let me know.
    example_title: Personal
  - text: |-
      Subject: Weekly Newsletter

      Body: Check out our latest deals! 50% off everything this weekend.
    example_title: Newsletter
  - text: |-
      Subject: Security Alert

      Body: A new device logged into your account from San Francisco, CA.
    example_title: Alert

Email Classifier (MiniLM ONNX)

A dual-head MiniLM classifier for email category + action prediction, optimized for on-device inference using ONNX Runtime.

Model Description

Classifies emails into 5 categories and predicts whether action is required:

Category	Description
PERSONAL	1:1 human communication, social messages
NEWSLETTER	Marketing, promotions, subscribed content
TRANSACTION	Orders, receipts, payments, confirmations
ALERT	Security notices, important notifications
SOCIAL	Social network notifications, community updates

Output Format

Single forward pass producing two tensors:

category_probs: Float32[5] — softmax probabilities per category (argmax = predicted category)
action_prob: Float32[1] — sigmoid probability of action required (threshold 0.5)

No text generation, no decoder, no beam search.

Example:

Input: "Subject: Your order has shipped\n\nBody: Your order #12345 is on its way..."
Output: category_probs → TRANSACTION (0.94), action_prob → 0.12 (NO_ACTION)

Intended Use

Primary: On-device email triage in mobile apps (iOS/Android)
Runtime: ONNX Runtime React Native
Use case: Prioritizing inbox, filtering noise, surfacing actionable emails

Model Details

Attribute	Value
Base Model	`microsoft/Multilingual-MiniLM-L12-H384`
Parameters	~117M
Architecture	XLM-R encoder + dual classification heads
ONNX Size	113 MB (INT8 quantized)
Max Sequence	256 tokens
Tokenizer	SentencePiece BPE (250K vocab)

Performance

Metric	Score
Category Accuracy	92.0%
Action Accuracy	82.8%
Quantization	INT8 dynamic (4x compression)

Training Data

Source: Personal Gmail inboxes (anonymized)
Languages: English, French
Labeling: Human-annotated with category + action flag
Input format: Subject: ...\n\nBody: ... (no instruction prefix)

How to Use

ONNX Runtime (React Native)

import { InferenceSession, Tensor } from 'onnxruntime-react-native';

const session = await InferenceSession.create('model.onnx');

const outputs = await session.run({
  input_ids: inputIdsTensor,
  attention_mask: attentionMaskTensor,
  token_type_ids: tokenTypeIdsTensor,  // all zeros
});

const categoryProbs = outputs.category_probs.data;  // Float32[5]
const actionProb = outputs.action_prob.data[0];      // Float32

Python (PyTorch)

from transformers import AutoTokenizer, AutoModel
import torch

# Load the base encoder + trained heads
tokenizer = AutoTokenizer.from_pretrained("Ippoboi/minilmail-classifier")
# Load DualHeadClassifier from checkpoint (see train_classifier.py)

text = "Subject: Meeting tomorrow\n\nBody: Can we reschedule to 3pm?"
inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True)

with torch.no_grad():
    cat_logits, act_logits = model(inputs["input_ids"], inputs["attention_mask"])
    category = ["ALERT", "NEWSLETTER", "PERSONAL", "SOCIAL", "TRANSACTION"][cat_logits.argmax()]
    action = torch.sigmoid(act_logits).item() > 0.5

Files

File	Size	Description
`model.onnx`	113 MB	INT8 quantized ONNX model
`tokenizer.json`	17 MB	SentencePiece BPE tokenizer (XLM-R vocab)

Architecture

Input → XLM-R Encoder (12 layers, 384 hidden) → [CLS] token
                                                    ↓
                                              ┌─────┴─────┐
                                              ↓           ↓
                                        Category Head  Action Head
                                        Linear(384→5)  Linear(384→1)
                                              ↓           ↓
                                          softmax      sigmoid

Compared to Previous Model (FLAN-T5)

	FLAN-T5 (v2)	MiniLM (this)
Download size	376 MB (3 files)	130 MB (2 files)
Inference	Encoder + decoder beam search	Single forward pass
Output	Generated text	Probabilities
Summaries	Yes	No (uses Gmail snippets)
Latency	~300ms+ (multiple decoder calls)	~30-50ms (single call)

Limitations

Trained primarily on English/French emails
May not generalize well to enterprise/corporate email patterns
Classification accuracy depends on email content quality (plain text preferred over HTML-heavy)
250K vocab tokenizer is oversized for this use case (XLM-R covers 100+ languages)

License

Apache 2.0