Email Reply Classifier (IT Outsourcing Outreach)

Fine-tuned DistilBERT that classifies inbound replies to B2B cold-email campaigns for an IT outsourcing company (offshore teams, dedicated developers, DevOps/cloud, AI/data engineers, software outsourcing, Vietnam-based talent, staff augmentation, remote engineering teams) into one of five intents.

Labels

id label meaning
0 Information Request Asks for details (pricing, case studies, deck, CVs, tech stack) โ€” no meeting yet.
1 Wrong Person Not the right contact; refers another person/department.
2 Interested Positive intent / openness, no direct ask.
3 Meeting Request Wants to schedule a call / proposes a time / asks availability.
4 Not Interested Rejects, opts out, unsubscribes, or no current need.

The id2label / label2id maps are stored in config.json.

Usage

from transformers import pipeline

clf = pipeline("text-classification", model="<your-username>/email-reply-classifier")
clf("Can we schedule a call next week?")
# [{'label': 'Meeting Request', 'score': 0.99}]

Or with the full project (rule pre-classifier + confidence gating + suggested actions): https://github.com/ โ€” see the accompanying email_classifier package.

Intended use

First-pass triage of cold-outreach replies so a sales team can auto-pause sequences and route replies (send materials, book a meeting, find the right contact, stop outreach). Pair with a confidence threshold (e.g. 0.65) to route low-confidence replies to a human.

Training data

Trained on a synthetically generated dataset of 5,000 examples (1,000 per label), balanced, with short/long/ambiguous replies, signatures, quoted fragments, typos/broken English, and multi-intent replies labeled by priority rules (Meeting > Wrong Person > Information > Interested > Not Interested).

Evaluation

On a stratified 10% held-out split of the synthetic data: accuracy 1.00, macro-F1 1.00.

โš ๏ธ Important: 1.00 on held-out synthetic data reflects that the templated data is highly separable โ€” it is not a measure of real-world accuracy. Before production use, collect and label real inbound replies (Smartlead, Apollo, Gmail, HubSpot, Instantly), evaluate against them, and fine-tune further. Treat this checkpoint as an MVP baseline.

Limitations & bias

  • Domain-specific to IT-outsourcing outreach; out-of-domain text is unreliable.
  • Synthetic training data underrepresents real nuance (e.g. "budget frozen until next year" may be read as Interested rather than Not Interested).
  • English only.

Framework

DistilBERT base uncased, fine-tuned 3 epochs (lr 2e-5, batch 16, max_len 256).

Downloads last month
15
Safetensors
Model size
67M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Tom11112000/email-reply-classifier

Finetuned
(11917)
this model