Upload README.md with huggingface_hub

2544f82 verified 27 days ago

5.43 kB

	---
	base_model:
	- microsoft/Multilingual-MiniLM-L12-H384
	license: apache-2.0
	language:
	- en
	- fr
	pipeline_tag: text-classification
	inference: false
	tags:
	- classification
	- emails
	- onnx
	- mobile
	- int8
	widget:
	- text: "Subject: Your order has shipped\n\nBody: Your order #12345 is on its way and will arrive by Monday."
	example_title: Transaction
	- text: "Subject: Meeting tomorrow\n\nBody: Hey, can we reschedule our 2pm meeting to 3pm? Let me know."
	example_title: Personal
	- text: "Subject: Weekly Newsletter\n\nBody: Check out our latest deals! 50% off everything this weekend."
	example_title: Newsletter
	- text: "Subject: Security Alert\n\nBody: A new device logged into your account from San Francisco, CA."
	example_title: Alert
	---

	# Email Classifier (MiniLM ONNX)

	A dual-head MiniLM classifier for email category + action prediction, optimized for on-device inference using ONNX Runtime.

	## Model Description

	Classifies emails into 5 categories and predicts whether action is required:

	\| Category \| Description \|
	\|----------\|-------------\|
	\| PERSONAL \| 1:1 human communication, social messages \|
	\| NEWSLETTER \| Marketing, promotions, subscribed content \|
	\| TRANSACTION \| Orders, receipts, payments, confirmations \|
	\| ALERT \| Security notices, important notifications \|
	\| SOCIAL \| Social network notifications, community updates \|

	### Output Format

	Single forward pass producing two tensors:
	- `category_probs`: Float32[5] — softmax probabilities per category (argmax = predicted category)
	- `action_prob`: Float32[1] — sigmoid probability of action required (threshold 0.5)

	No text generation, no decoder, no beam search.

	Example:

	```
	Input: "Subject: Your order has shipped\n\nBody: Your order #12345 is on its way..."
	Output: category_probs → TRANSACTION (0.94), action_prob → 0.12 (NO_ACTION)
	```

	## Intended Use

	- Primary: On-device email triage in mobile apps (iOS/Android)
	- Runtime: ONNX Runtime React Native
	- Use case: Prioritizing inbox, filtering noise, surfacing actionable emails

	## Model Details

	\| Attribute \| Value \|
	\|-----------\|-------\|
	\| Base Model \| `microsoft/Multilingual-MiniLM-L12-H384` \|
	\| Parameters \| ~117M \|
	\| Architecture \| XLM-R encoder + dual classification heads \|
	\| ONNX Size \| 113 MB (INT8 quantized) \|
	\| Max Sequence \| 256 tokens \|
	\| Tokenizer \| SentencePiece BPE (250K vocab) \|

	## Performance

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Category Accuracy \| 92.0% \|
	\| Action Accuracy \| 82.8% \|
	\| Quantization \| INT8 dynamic (4x compression) \|

	## Training Data

	- Source: Personal Gmail inboxes (anonymized)
	- Languages: English, French
	- Labeling: Human-annotated with category + action flag
	- Input format: `Subject: ...\n\nBody: ...` (no instruction prefix)

	## How to Use

	### ONNX Runtime (React Native)

	```typescript
	import { InferenceSession, Tensor } from 'onnxruntime-react-native';

	const session = await InferenceSession.create('model.onnx');

	const outputs = await session.run({
	input_ids: inputIdsTensor,
	attention_mask: attentionMaskTensor,
	token_type_ids: tokenTypeIdsTensor, // all zeros
	});

	const categoryProbs = outputs.category_probs.data; // Float32[5]
	const actionProb = outputs.action_prob.data[0]; // Float32
	```

	### Python (PyTorch)

	```python
	from transformers import AutoTokenizer, AutoModel
	import torch

	# Load the base encoder + trained heads
	tokenizer = AutoTokenizer.from_pretrained("Ippoboi/minilmail-classifier")
	# Load DualHeadClassifier from checkpoint (see train_classifier.py)

	text = "Subject: Meeting tomorrow\n\nBody: Can we reschedule to 3pm?"
	inputs = tokenizer(text, return_tensors="pt", max_length=256, truncation=True)

	with torch.no_grad():
	cat_logits, act_logits = model(inputs["input_ids"], inputs["attention_mask"])
	category = ["ALERT", "NEWSLETTER", "PERSONAL", "SOCIAL", "TRANSACTION"][cat_logits.argmax()]
	action = torch.sigmoid(act_logits).item() > 0.5
	```

	## Files

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `model.onnx` \| 113 MB \| INT8 quantized ONNX model \|
	\| `tokenizer.json` \| 17 MB \| SentencePiece BPE tokenizer (XLM-R vocab) \|

	## Architecture

	```
	Input → XLM-R Encoder (12 layers, 384 hidden) → [CLS] token
	↓
	┌─────┴─────┐
	↓ ↓
	Category Head Action Head
	Linear(384→5) Linear(384→1)
	↓ ↓
	softmax sigmoid
	```

	## Compared to Previous Model (FLAN-T5)

	\| \| FLAN-T5 (v2) \| MiniLM (this) \|
	\|---\|---\|---\|
	\| Download size \| 376 MB (3 files) \| 130 MB (2 files) \|
	\| Inference \| Encoder + decoder beam search \| Single forward pass \|
	\| Output \| Generated text \| Probabilities \|
	\| Summaries \| Yes \| No (uses Gmail snippets) \|
	\| Latency \| ~300ms+ (multiple decoder calls) \| ~30-50ms (single call) \|

	## Limitations

	- Trained primarily on English/French emails
	- May not generalize well to enterprise/corporate email patterns
	- Classification accuracy depends on email content quality (plain text preferred over HTML-heavy)
	- 250K vocab tokenizer is oversized for this use case (XLM-R covers 100+ languages)

	## License

	Apache 2.0