Janvi17's picture
Add live demo link to model card
c6b6fea verified
---
library_name: transformers
license: apache-2.0
base_model: distilbert/distilbert-base-uncased
tags:
- text-classification
- customer-support
- intent-classification
- distilbert
- support-tickets
language:
- en
datasets:
- bitext/Bitext-customer-support-llm-chatbot-training-dataset
metrics:
- accuracy
- f1
pipeline_tag: text-classification
widget:
- text: "I want to cancel my subscription immediately"
example_title: Subscription cancellation
- text: "Where is my package? I've been waiting for 2 weeks"
example_title: Delivery tracking
- text: "I need a refund for my last purchase"
example_title: Refund request
- text: "How do I change my account password?"
example_title: Account management
- text: "I want to speak to a human agent"
example_title: Contact request
- text: "Can you send me the invoice for order #12345?"
example_title: Invoice request
model-index:
- name: customer-support-ticket-classifier
results:
- task:
type: text-classification
name: Customer Support Issue Classification
dataset:
name: Bitext Customer Support
type: bitext/Bitext-customer-support-llm-chatbot-training-dataset
split: test
metrics:
- type: accuracy
value: 1.0
name: Accuracy
- type: f1
value: 1.0
name: Macro F1
---
# Customer Support Ticket Classifier
A fine-tuned [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) model that classifies customer support tickets into **11 issue categories**. Designed for automatic routing, triage, and analytics of customer inquiries.
> **πŸš€ [Try the live demo β†’](https://huggingface.co/spaces/Janvi17/customer-support-ticket-classifier-demo)**
## Quick Start
```python
from transformers import pipeline
classifier = pipeline("text-classification", model="Janvi17/customer-support-ticket-classifier")
result = classifier("I want to cancel my subscription immediately")
print(result)
# [{'label': 'SUBSCRIPTION', 'score': 0.997}]
```
## Categories
The model classifies text into one of 11 customer support issue types:
| Label | Description | Example |
|---|---|---|
| `ACCOUNT` | Account creation, deletion, password, profile changes | *"How do I change my account password?"* |
| `CANCEL` | Cancellation fees, policies, contract termination | *"What is the fee for canceling the contract?"* |
| `CONTACT` | Reaching customer service, speaking to a human agent | *"I want to speak to a human agent"* |
| `DELIVERY` | Delivery options, shipping methods, delivery regions | *"Do you ship to Hungary?"* |
| `FEEDBACK` | Reviews, complaints, submitting feedback | *"I'd like to leave a review for your services"* |
| `INVOICE` | Viewing, requesting, or locating invoices/bills | *"Can you send me the invoice for order #12345?"* |
| `ORDER` | Placing, tracking, modifying, or canceling orders | *"I need help cancelling order #55123"* |
| `PAYMENT` | Payment methods, issues, checkout errors | *"I get an error when I try to check out"* |
| `REFUND` | Refund requests, refund policy, tracking refunds | *"I need a refund for my last purchase"* |
| `SHIPPING` | Shipping address changes, setup, modifications | *"I need to update my shipping address"* |
| `SUBSCRIPTION` | Newsletter signup, unsubscribe, subscription management | *"Help me unsubscribe from your newsletter"* |
## Usage Examples
### Basic Classification
```python
from transformers import pipeline
classifier = pipeline("text-classification", model="Janvi17/customer-support-ticket-classifier")
tickets = [
"I want to cancel my subscription immediately",
"Where is my package? I've been waiting for 2 weeks",
"I need a refund for my last purchase",
"How do I change my account password?",
"I want to speak to a human agent",
"Can you send me the invoice for order #12345?",
]
for ticket in tickets:
result = classifier(ticket)
print(f" [{result[0]['label']:>12s}] (conf: {result[0]['score']:.3f}) {ticket}")
```
Output:
```
[SUBSCRIPTION] (conf: 0.997) I want to cancel my subscription immediately
[ DELIVERY] (conf: 0.997) Where is my package? I've been waiting for 2 weeks
[ REFUND] (conf: 0.999) I need a refund for my last purchase
[ ACCOUNT] (conf: 1.000) How do I change my account password?
[ CONTACT] (conf: 0.999) I want to speak to a human agent
[ INVOICE] (conf: 0.997) Can you send me the invoice for order #12345?
```
### Batch Classification with Confidence Scores
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Janvi17/customer-support-ticket-classifier",
top_k=3, # return top 3 predictions
)
result = classifier("The payment for my subscription failed")
for pred in result[0]:
print(f" {pred['label']:>14s}: {pred['score']:.4f}")
# PAYMENT: 0.9661
# SUBSCRIPTION: 0.0153
# ORDER: 0.0068
```
### Using with PyTorch Directly
```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "Janvi17/customer-support-ticket-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "I need a refund for my last purchase"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)
pred_id = probs.argmax().item()
print(f"Predicted: {model.config.id2label[pred_id]} ({probs[0][pred_id]:.4f})")
# Predicted: REFUND (0.9995)
```
### Production Usage with Confidence Threshold
For production systems, reject low-confidence predictions:
```python
from transformers import pipeline
classifier = pipeline("text-classification", model="Janvi17/customer-support-ticket-classifier")
CONFIDENCE_THRESHOLD = 0.85
def classify_ticket(text: str) -> dict:
result = classifier(text)[0]
if result["score"] < CONFIDENCE_THRESHOLD:
return {"label": "UNKNOWN", "score": result["score"], "routed_to": "human_review"}
return {"label": result["label"], "score": result["score"], "routed_to": "auto"}
# High-confidence β†’ auto-routed
print(classify_ticket("I need a refund"))
# {'label': 'REFUND', 'score': 0.999, 'routed_to': 'auto'}
# Low-confidence β†’ human review
print(classify_ticket("asdfghjkl"))
# {'label': 'UNKNOWN', 'score': 0.78, 'routed_to': 'human_review'}
```
## Evaluation Results
### Held-Out Test Set (2,464 samples)
| Metric | Score |
|---|---|
| **Accuracy** | **100.00%** |
| **Macro F1** | **100.00%** |
| **Weighted F1** | **100.00%** |
### Per-Class Performance
| Category | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| ACCOUNT | 1.0000 | 1.0000 | 1.0000 | 545 |
| CANCEL | 1.0000 | 1.0000 | 1.0000 | 95 |
| CONTACT | 1.0000 | 1.0000 | 1.0000 | 200 |
| DELIVERY | 1.0000 | 1.0000 | 1.0000 | 166 |
| FEEDBACK | 1.0000 | 1.0000 | 1.0000 | 199 |
| INVOICE | 1.0000 | 1.0000 | 1.0000 | 183 |
| ORDER | 1.0000 | 1.0000 | 1.0000 | 317 |
| PAYMENT | 1.0000 | 1.0000 | 1.0000 | 200 |
| REFUND | 1.0000 | 1.0000 | 1.0000 | 262 |
| SHIPPING | 1.0000 | 1.0000 | 1.0000 | 197 |
| SUBSCRIPTION | 1.0000 | 1.0000 | 1.0000 | 100 |
### Confusion Matrix
Perfect diagonal β€” zero off-diagonal errors on the held-out test set.
### Training Trajectory
| Epoch | Train Loss | Val Loss | Val Accuracy | Val Macro F1 |
|---|---|---|---|---|
| 1 | 0.0229 | 0.0163 | 99.76% | 99.78% |
| 2 | 0.0042 | 0.0106 | 99.68% | 99.68% |
| **3** | **0.0024** | **0.0054** | **99.88%** | **99.88% ✦ best** |
| 4 | 0.0008 | 0.0091 | 99.80% | 99.80% |
| 5 | 0.0007 | 0.0088 | 99.80% | 99.80% |
Best checkpoint selected at epoch 3 (highest validation macro F1). Early stopping was configured with patience=2.
### Baselines
| Method | Accuracy | Macro F1 |
|---|---|---|
| Random | 9.9% | 9.1% |
| Majority class | 22.1% | 3.3% |
| TF-IDF + Logistic Regression | 99.7% | 99.7% |
| **This model (DistilBERT)** | **100.0%** | **100.0%** |
## Training Details
### Dataset
**[Bitext Customer Support LLM Chatbot Training Dataset](https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset)**
- **License:** [CDLA-Sharing-1.0](https://cdla.dev/sharing-1-0/)
- **Publisher:** [Bitext](https://www.bitext.com/)
- **Size:** 26,872 rows β†’ 24,635 after deduplication
- **Splits used:** 80% train (19,708) / 10% validation (2,463) / 10% test (2,464), stratified
- **Language:** English
- **Format:** Synthetic, template-generated customer support messages with intentional typos, case variations, and paraphrasing. Includes `{{placeholder}}` tokens for entities like order numbers and names.
The dataset contains 11 high-level issue categories and 27 fine-grained intents. This model classifies at the **category level**.
#### Class Distribution (Training Set)
| Category | Count | % |
|---|---|---|
| ACCOUNT | 4,354 | 22.1% |
| ORDER | 2,534 | 12.9% |
| REFUND | 2,098 | 10.6% |
| CONTACT | 1,599 | 8.1% |
| PAYMENT | 1,598 | 8.1% |
| FEEDBACK | 1,598 | 8.1% |
| SHIPPING | 1,576 | 8.0% |
| INVOICE | 1,464 | 7.4% |
| DELIVERY | 1,327 | 6.7% |
| SUBSCRIPTION | 799 | 4.1% |
| CANCEL | 760 | 3.9% |
Imbalance ratio: 5.7Γ— (ACCOUNT vs CANCEL). Despite this, macro F1 is perfect β€” all classes are well-separated in semantic space.
### Preprocessing
1. Exact-duplicate removal (26,872 β†’ 24,635 samples)
2. Stratified train/val/test split (80/10/10, seed=42)
3. Tokenization with `distilbert-base-uncased` tokenizer, `max_length=128`
4. Dynamic padding via `DataCollatorWithPadding`
No text was truncated β€” the longest tokenized input is 32 tokens.
### Hyperparameters
| Parameter | Value |
|---|---|
| Base model | `distilbert/distilbert-base-uncased` (67M params) |
| Learning rate | 2e-5 |
| Batch size | 32 |
| Epochs | 5 (best checkpoint at epoch 3) |
| Weight decay | 0.01 |
| Warmup steps | 308 (10% of total) |
| LR scheduler | Cosine |
| Early stopping | Patience = 2 (metric: macro F1) |
| Precision | fp32 (trained on CPU) |
| Seed | 42 |
### Framework Versions
- Transformers 5.7.0
- PyTorch 2.11.0
- Datasets 4.8.5
- Tokenizers 0.22.2
## Demo
**[πŸš€ Try the live Gradio demo](https://huggingface.co/spaces/Janvi17/customer-support-ticket-classifier-demo)** β€” paste any support ticket and see real-time classification with confidence breakdown.
## Limitations and Risks
### Known Failure Modes
This model was stress-tested with 28 adversarial inputs. Four systematic weaknesses were identified:
#### 1. Negation Blindness πŸ”΄
The model ignores negation. *"Don't refund me, just fix the product"* is classified as REFUND (99.95% confidence). The training data contains no negated intents, and DistilBERT's 6-layer architecture has limited compositional reasoning.
**Mitigation:** Add negated intent examples to training data, or post-process with a negation detector.
#### 2. No Out-of-Distribution Rejection πŸ”΄
The model assigns a label to any input, including gibberish, empty strings, and unrelated text. Examples:
| Input | Prediction | Confidence |
|---|---|---|
| `"asdfghjkl"` | ORDER | 78.1% |
| `""` (empty) | ORDER | 73.7% |
| `"The quick brown fox..."` | CONTACT | 84.9% |
**Mitigation:** Use a confidence threshold (recommended: 0.85) to reject uncertain predictions. See the production usage example above.
#### 3. Heavy Typo Fragility 🟑
While the model handles mild typos well (the training data includes ~34% typo-augmented samples), severely misspelled text can cause misclassification:
| Input | Expected | Predicted | Confidence |
|---|---|---|---|
| *"hwere is my pakage"* | DELIVERY | DELIVERY | 48.1% ⚠️ |
| *"I wnat to spek to a humna"* | CONTACT | ORDER | 81.8% ❌ |
**Mitigation:** Add a spell-correction preprocessing step, or augment training data with heavier typo injection.
#### 4. Single-Label on Multi-Intent Tickets 🟑
Real support tickets often span multiple categories. The model picks one:
| Input | Predicted | Also relevant |
|---|---|---|
| *"Cancel my subscription and give me a refund"* | REFUND | CANCEL, SUBSCRIPTION |
| *"Your delivery is terrible, I want to complain"* | DELIVERY | FEEDBACK |
**Mitigation:** For full coverage, return top-k predictions or switch to multi-label classification.
### Dataset Limitations
- **Synthetic data:** The training set is template-generated, not sourced from real customer interactions. Real-world text may contain slang, code-switching, or domain-specific jargon not represented in training.
- **English only:** The model is trained exclusively on English text.
- **Limited vocabulary:** Some categories have as few as 184 unique words (CANCEL), meaning the model relies heavily on keyword matching rather than deep semantic understanding.
- **Placeholder artifacts:** Training data contains `{{Order Number}}`, `{{Person Name}}`, etc. The model has learned to ignore these, but unusual entity formats in real data could affect performance.
### Bias and Fairness
- The synthetic dataset does not represent any specific demographic or dialect distribution.
- Performance on non-standard English (e.g., AAVE, Indian English, ESL patterns) has not been evaluated.
- The model may perform differently across age groups, regions, or communication styles.
### When NOT to Use This Model
- **Safety-critical routing:** Do not use as the sole decision-maker for urgent or safety-related tickets without human review.
- **Non-English text:** The model will produce unreliable predictions on non-English input.
- **Fine-grained intent classification:** This model classifies into 11 broad categories, not 27 fine-grained intents. If you need intent-level predictions (e.g., distinguishing `cancel_order` from `check_cancellation_fee`), retrain with the `intent` column.
## Citation
If you use this model, please cite the training dataset:
```bibtex
@misc{bitext2023customer,
title={Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants},
author={Bitext},
year={2023},
publisher={Hugging Face},
url={https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset}
}
```
## License
This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). The training dataset is licensed under [CDLA-Sharing-1.0](https://cdla.dev/sharing-1-0/).