Add model card from franz-email-classifier

dcfd2ab verified 9 days ago

3.74 kB

	---
	license: mit
	language:
	- en
	- de
	- multilingual
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- email
	- classification
	- multi-label
	- onnx
	- int8
	- priority
	base_model: microsoft/Multilingual-MiniLM-L12-H384
	model-index:
	- name: franz-email-classifier
	results: []
	---

	# Franz Email Classifier

	Multi-label email classification model used by [Franz](https://meetfranz.com) to automatically prioritize emails.

	Fine-tuned from [`microsoft/Multilingual-MiniLM-L12-H384`](https://huggingface.co/microsoft/Multilingual-MiniLM-L12-H384) and exported as ONNX INT8 for fast CPU inference in Electron.

	## Labels

	The model predicts 8 binary labels per email:

	\| Label \| Meaning \|
	\|---\|---\|
	\| `IS_URGENT` \| Needs attention today \|
	\| `NEEDS_REPLY` \| Direct question or action request to the user \|
	\| `HAS_DEADLINE` \| Explicit or relative deadline mentioned \|
	\| `IS_ACTIONABLE` \| Any action required (broader than NEEDS_REPLY) \|
	\| `IS_INFORMATIONAL` \| FYI / status update, no action needed \|
	\| `IS_AUTOMATED` \| Machine-generated (CI/CD, monitoring, alerts) \|
	\| `IS_NEWSLETTER` \| Content marketing / newsletter \|
	\| `IS_TRANSACTIONAL` \| Receipt, invoice, order confirmation \|

	## Priority Mapping

	Labels are combined into priority tiers in the Franz app:

	\| Condition \| Priority \|
	\|---\|---\|
	\| IS_URGENT + NEEDS_REPLY \| `urgent` \|
	\| IS_URGENT \| `important` \|
	\| NEEDS_REPLY + IS_ACTIONABLE \| `important` \|
	\| IS_NEWSLETTER or IS_TRANSACTIONAL \| `noise` \|
	\| IS_AUTOMATED (not urgent) \| `noise` \|
	\| IS_INFORMATIONAL (not urgent/reply) \| `low` \|
	\| Everything else \| `normal` \|

	## Usage

	### With @huggingface/transformers (Node.js / Electron)

	```ts
	import { pipeline, env } from '@huggingface/transformers'

	env.allowLocalModels = true
	env.localModelPath = '/path/to/models' // parent dir

	const classifier = await pipeline(
	'text-classification',
	'email-classifier', // subdirectory name
	{ dtype: 'int8', device: 'cpu', multi_label: true }
	)

	const result = await classifier('Re: Urgent: Invoice #4521 due Friday')
	// [
	// { label: 'IS_URGENT', score: 0.94 },
	// { label: 'NEEDS_REPLY', score: 0.12 },
	// { label: 'HAS_DEADLINE', score: 0.91 },
	// ...
	// ]
	```

	### With transformers (Python)

	```python
	from transformers import pipeline

	classifier = pipeline(
	"text-classification",
	model="meetfranz/franz-models",
	top_k=None
	)

	result = classifier("Re: Urgent: Invoice #4521 due Friday")
	```

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Architecture \| BertForSequenceClassification \|
	\| Base model \| microsoft/Multilingual-MiniLM-L12-H384 \|
	\| Hidden size \| 384 \|
	\| Layers \| 12 \|
	\| Attention heads \| 12 \|
	\| Max sequence length \| 512 \|
	\| Vocab size \| 250,037 \|
	\| Tokenizer \| XLMRobertaTokenizer (SentencePiece BPE) \|
	\| Problem type \| Multi-label classification \|
	\| Quantization \| ONNX INT8 \|
	\| Model size \| ~113 MB (quantized) \|

	## Training

	Trained on LLM-generated synthetic email data. No real user emails or personal data were used in training. Labels were bootstrapped via LLM annotation and human-reviewed for quality.

	Fine-tuned with multi-label BCE loss, then exported to ONNX with INT8 dynamic quantization.

	## How Franz Uses This Model

	This model is Stage 2 in Franz's three-stage email classification funnel:

	1. Stage 1 — Heuristics: Fast rules-based classification for obvious cases
	2. Stage 2 — ML (this model): ONNX inference for ambiguous emails (confidence threshold: 0.75)
	3. Stage 3 — LLM: Local or cloud LLM for emails below the ML confidence threshold

	The model is downloaded on demand when a user first adds an email account to Franz. If unavailable, the app gracefully falls through to Stage 3.

	## License

	MIT