Email Reply Classifier (IT Outsourcing Outreach)

Fine-tuned DistilBERT that classifies inbound replies to B2B cold-email campaigns for an IT outsourcing company (offshore teams, dedicated developers, DevOps/cloud, AI/data engineers, software outsourcing, Vietnam-based talent, staff augmentation, remote engineering teams) into one of five intents.

Labels

id	label	meaning
0	`Information Request`	Asks for details (pricing, case studies, deck, CVs, tech stack) — no meeting yet.
1	`Wrong Person`	Not the right contact; refers another person/department.
2	`Interested`	Positive intent / openness, no direct ask.
3	`Meeting Request`	Wants to schedule a call / proposes a time / asks availability.
4	`Not Interested`	Rejects, opts out, unsubscribes, or no current need.

The id2label / label2id maps are stored in config.json.

Usage

from transformers import pipeline

clf = pipeline("text-classification", model="<your-username>/email-reply-classifier")
clf("Can we schedule a call next week?")
# [{'label': 'Meeting Request', 'score': 0.99}]

Or with the full project (rule pre-classifier + confidence gating + suggested actions): https://github.com/ — see the accompanying email_classifier package.

Intended use

First-pass triage of cold-outreach replies so a sales team can auto-pause sequences and route replies (send materials, book a meeting, find the right contact, stop outreach). Pair with a confidence threshold (e.g. 0.65) to route low-confidence replies to a human.

Training data

Trained on a synthetically generated dataset of 5,000 examples (1,000 per label), balanced, with short/long/ambiguous replies, signatures, quoted fragments, typos/broken English, and multi-intent replies labeled by priority rules (Meeting > Wrong Person > Information > Interested > Not Interested).

Evaluation

On a stratified 10% held-out split of the synthetic data: accuracy 1.00, macro-F1 1.00.

⚠️ Important: 1.00 on held-out synthetic data reflects that the templated data is highly separable — it is not a measure of real-world accuracy. Before production use, collect and label real inbound replies (Smartlead, Apollo, Gmail, HubSpot, Instantly), evaluate against them, and fine-tune further. Treat this checkpoint as an MVP baseline.

Limitations & bias

Domain-specific to IT-outsourcing outreach; out-of-domain text is unreliable.
Synthetic training data underrepresents real nuance (e.g. "budget frozen until next year" may be read as Interested rather than Not Interested).
English only.

Framework

DistilBERT base uncased, fine-tuned 3 epochs (lr 2e-5, batch 16, max_len 256).

Downloads last month: 15

Safetensors

Model size

67M params

Tensor type

F32

Model tree for Tom11112000/email-reply-classifier

Base model

distilbert/distilbert-base-uncased

Finetuned

(11917)

this model