| # Email Processing ModernBERT Model | |
| Fine-tuned ModernBERT model for email processing tasks. | |
| ## Model Capabilities | |
| This model can compute semantic similarity between questions and answers related to: | |
| - Email addresses | |
| - Subject lines | |
| ## Recommended Thresholds | |
| Based on extensive testing, the following thresholds are recommended: | |
| - For email questions: 0.85 | |
| - For subject questions: 0.70 | |
| - For other questions: 0.80 | |
| Additional content-aware checks are recommended for best results. | |
| ## Usage | |
| ```python | |
| from sentence_transformers import SentenceTransformer | |
| import torch | |
| # Load the model | |
| model = SentenceTransformer('sugiv/email-processing-modernbert') | |
| # Encode questions and answers | |
| q_embed = model.encode("What's your email address?", convert_to_tensor=True) | |
| a1_embed = model.encode("My email is user@example.com", convert_to_tensor=True) | |
| a2_embed = model.encode("The weather is nice today", convert_to_tensor=True) | |
| # Calculate similarity | |
| similarity1 = torch.nn.functional.cosine_similarity(q_embed.unsqueeze(0), a1_embed.unsqueeze(0)).item() | |
| similarity2 = torch.nn.functional.cosine_similarity(q_embed.unsqueeze(0), a2_embed.unsqueeze(0)).item() | |
| print(f'Similarity with relevant answer: {similarity1:.4f}') | |
| print(f'Similarity with irrelevant answer: {similarity2:.4f}') | |
| # Apply threshold | |
| threshold = 0.85 # For email questions | |
| print(f'Is relevant: {similarity1 >= threshold}') | |
| print(f'Is irrelevant: {similarity2 < threshold}') | |
| ``` | |
| ## Training Information | |
| - Base model: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) | |
| - Published date: 2025-04-24 | |
| - Training approach: Fine-tuned with balanced dataset of email and subject questions | |
| - Framework: sentence-transformers with PyTorch | |