File size: 4,044 Bytes

36489d1

---
license: mit
license_link: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/resolve/main/LICENSE
language:
- en
pipeline_tag: text-classification
tags:
- email-classification
- mlx
- phi-3
- lora
- text-classification
library_name: mlx
base_model: microsoft/Phi-3-mini-4k-instruct
datasets:
- private
widget:
- text: "Classify this email:\n\nYour order #12345 has been shipped and will arrive in 3-5 business days.\n\nCategory:"
  example_title: "Transactional Email"
- text: "Classify this email:\n\n🎉 Limited Time Offer! Get 50% off all products this weekend only!\n\nCategory:"
  example_title: "Promotional Email"
- text: "Classify this email:\n\nYour password was changed on December 7, 2025. If you didn't make this change, please contact support immediately.\n\nCategory:"
  example_title: "Security Alert"
---

# Email Classifier - Phi-3 Mini Fine-tuned

This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) for email classification tasks. It uses LoRA (Low-Rank Adaptation) for efficient fine-tuning on Apple Silicon using the MLX framework.

## Model Description

- **Base Model**: microsoft/Phi-3-mini-4k-instruct
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Framework**: Apple MLX
- **Task**: Email Classification
- **Categories**: 20 email categories including promotional, transactional, notification, security, event, educational, newsletter, survey, business, personal, and more

## Intended Use

This model classifies emails into predefined categories to help with inbox organization, email filtering, and workflow automation.

### Direct Use

```python
from mlx_lm import load, generate

# Load the model
model, tokenizer = load("jake-watkins/email-classifier")

# Classify an email
email_content = """
Your subscription to Premium Service will renew on January 1st, 2026.
To cancel or modify your subscription, visit your account settings.
"""

prompt = f"Classify this email:\n\n{email_content}\n\nCategory:"

response = generate(model, tokenizer, prompt=prompt, max_tokens=50, verbose=False)
print(response)
```

## Training Data

The model was trained on a private dataset of email examples across 20 categories:
- promotional
- transactional
- notification
- security
- event
- educational
- newsletter
- survey
- business
- personal
- solicitation
- recruitment
- membership
- political
- informative
- account
- press
- memorial
- file
- admission

## Training Procedure

### Training Hyperparameters

- **Iterations**: 699
- **Learning Rate**: 1e-5
- **Batch Size**: 1
- **Max Sequence Length**: 512 tokens
- **LoRA Layers**: 16
- **Steps per Eval**: 100
- **Validation Batches**: 25

### Framework

Fine-tuned using MLX-LM on Apple Silicon with LoRA adapters for parameter-efficient training.

## Evaluation

The model was validated on a held-out test set with stratified sampling to maintain category distribution across training, validation, and test splits (80/10/10).

## Limitations

- **Language**: Primarily trained on English emails
- **Context Length**: Optimized for sequences up to 512 tokens; longer emails are truncated
- **Categories**: Limited to the 20 predefined categories; may not generalize to novel email types
- **Domain**: Performance may vary on highly specialized or domain-specific emails

## Ethical Considerations

This model is intended for email organization and automation purposes. Users should:
- Ensure compliance with privacy regulations when processing email content
- Not use for unauthorized email monitoring or surveillance
- Be aware that classification errors may occur

## Citation

If you use this model, please cite the base model:

```bibtex
@article{abdin2024phi,
  title={Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone},
  author={Abdin, Marah and others},
  journal={arXiv preprint arXiv:2404.14219},
  year={2024}
}
```

## Model Card Contact

For questions or feedback about this model, please open an issue on the model repository.