---
library_name: transformers
license: apache-2.0
base_model: distilbert/distilbert-base-uncased
tags:
  - text-classification
  - customer-support
  - intent-classification
  - distilbert
  - support-tickets
language:
  - en
datasets:
  - bitext/Bitext-customer-support-llm-chatbot-training-dataset
metrics:
  - accuracy
  - f1
pipeline_tag: text-classification
widget:
  - text: "I want to cancel my subscription immediately"
    example_title: Subscription cancellation
  - text: "Where is my package? I've been waiting for 2 weeks"
    example_title: Delivery tracking
  - text: "I need a refund for my last purchase"
    example_title: Refund request
  - text: "How do I change my account password?"
    example_title: Account management
  - text: "I want to speak to a human agent"
    example_title: Contact request
  - text: "Can you send me the invoice for order #12345?"
    example_title: Invoice request
model-index:
  - name: customer-support-ticket-classifier
    results:
      - task:
          type: text-classification
          name: Customer Support Issue Classification
        dataset:
          name: Bitext Customer Support
          type: bitext/Bitext-customer-support-llm-chatbot-training-dataset
          split: test
        metrics:
          - type: accuracy
            value: 1.0
            name: Accuracy
          - type: f1
            value: 1.0
            name: Macro F1
---

# Customer Support Ticket Classifier

A fine-tuned [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) model that classifies customer support tickets into **11 issue categories**. Designed for automatic routing, triage, and analytics of customer inquiries.

> **🚀 [Try the live demo →](https://huggingface.co/spaces/Janvi17/customer-support-ticket-classifier-demo)**

## Quick Start

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="Janvi17/customer-support-ticket-classifier")

result = classifier("I want to cancel my subscription immediately")
print(result)
# [{'label': 'SUBSCRIPTION', 'score': 0.997}]
```

## Categories

The model classifies text into one of 11 customer support issue types:

| Label | Description | Example |
|---|---|---|
| `ACCOUNT` | Account creation, deletion, password, profile changes | *"How do I change my account password?"* |
| `CANCEL` | Cancellation fees, policies, contract termination | *"What is the fee for canceling the contract?"* |
| `CONTACT` | Reaching customer service, speaking to a human agent | *"I want to speak to a human agent"* |
| `DELIVERY` | Delivery options, shipping methods, delivery regions | *"Do you ship to Hungary?"* |
| `FEEDBACK` | Reviews, complaints, submitting feedback | *"I'd like to leave a review for your services"* |
| `INVOICE` | Viewing, requesting, or locating invoices/bills | *"Can you send me the invoice for order #12345?"* |
| `ORDER` | Placing, tracking, modifying, or canceling orders | *"I need help cancelling order #55123"* |
| `PAYMENT` | Payment methods, issues, checkout errors | *"I get an error when I try to check out"* |
| `REFUND` | Refund requests, refund policy, tracking refunds | *"I need a refund for my last purchase"* |
| `SHIPPING` | Shipping address changes, setup, modifications | *"I need to update my shipping address"* |
| `SUBSCRIPTION` | Newsletter signup, unsubscribe, subscription management | *"Help me unsubscribe from your newsletter"* |

## Usage Examples

### Basic Classification

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="Janvi17/customer-support-ticket-classifier")

tickets = [
    "I want to cancel my subscription immediately",
    "Where is my package? I've been waiting for 2 weeks",
    "I need a refund for my last purchase",
    "How do I change my account password?",
    "I want to speak to a human agent",
    "Can you send me the invoice for order #12345?",
]

for ticket in tickets:
    result = classifier(ticket)
    print(f"  [{result[0]['label']:>12s}] (conf: {result[0]['score']:.3f}) {ticket}")
```

Output:
```
  [SUBSCRIPTION] (conf: 0.997) I want to cancel my subscription immediately
  [    DELIVERY] (conf: 0.997) Where is my package? I've been waiting for 2 weeks
  [      REFUND] (conf: 0.999) I need a refund for my last purchase
  [     ACCOUNT] (conf: 1.000) How do I change my account password?
  [     CONTACT] (conf: 0.999) I want to speak to a human agent
  [     INVOICE] (conf: 0.997) Can you send me the invoice for order #12345?
```

### Batch Classification with Confidence Scores

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Janvi17/customer-support-ticket-classifier",
    top_k=3,  # return top 3 predictions
)

result = classifier("The payment for my subscription failed")
for pred in result[0]:
    print(f"  {pred['label']:>14s}: {pred['score']:.4f}")
# PAYMENT:       0.9661
# SUBSCRIPTION:  0.0153
# ORDER:         0.0068
```

### Using with PyTorch Directly

```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "Janvi17/customer-support-ticket-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "I need a refund for my last purchase"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1)
    pred_id = probs.argmax().item()

print(f"Predicted: {model.config.id2label[pred_id]} ({probs[0][pred_id]:.4f})")
# Predicted: REFUND (0.9995)
```

### Production Usage with Confidence Threshold

For production systems, reject low-confidence predictions:

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="Janvi17/customer-support-ticket-classifier")
CONFIDENCE_THRESHOLD = 0.85

def classify_ticket(text: str) -> dict:
    result = classifier(text)[0]
    if result["score"] < CONFIDENCE_THRESHOLD:
        return {"label": "UNKNOWN", "score": result["score"], "routed_to": "human_review"}
    return {"label": result["label"], "score": result["score"], "routed_to": "auto"}

# High-confidence → auto-routed
print(classify_ticket("I need a refund"))
# {'label': 'REFUND', 'score': 0.999, 'routed_to': 'auto'}

# Low-confidence → human review
print(classify_ticket("asdfghjkl"))
# {'label': 'UNKNOWN', 'score': 0.78, 'routed_to': 'human_review'}
```

## Evaluation Results

### Held-Out Test Set (2,464 samples)

| Metric | Score |
|---|---|
| **Accuracy** | **100.00%** |
| **Macro F1** | **100.00%** |
| **Weighted F1** | **100.00%** |

### Per-Class Performance

| Category | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| ACCOUNT | 1.0000 | 1.0000 | 1.0000 | 545 |
| CANCEL | 1.0000 | 1.0000 | 1.0000 | 95 |
| CONTACT | 1.0000 | 1.0000 | 1.0000 | 200 |
| DELIVERY | 1.0000 | 1.0000 | 1.0000 | 166 |
| FEEDBACK | 1.0000 | 1.0000 | 1.0000 | 199 |
| INVOICE | 1.0000 | 1.0000 | 1.0000 | 183 |
| ORDER | 1.0000 | 1.0000 | 1.0000 | 317 |
| PAYMENT | 1.0000 | 1.0000 | 1.0000 | 200 |
| REFUND | 1.0000 | 1.0000 | 1.0000 | 262 |
| SHIPPING | 1.0000 | 1.0000 | 1.0000 | 197 |
| SUBSCRIPTION | 1.0000 | 1.0000 | 1.0000 | 100 |

### Confusion Matrix

Perfect diagonal — zero off-diagonal errors on the held-out test set.

### Training Trajectory

| Epoch | Train Loss | Val Loss | Val Accuracy | Val Macro F1 |
|---|---|---|---|---|
| 1 | 0.0229 | 0.0163 | 99.76% | 99.78% |
| 2 | 0.0042 | 0.0106 | 99.68% | 99.68% |
| **3** | **0.0024** | **0.0054** | **99.88%** | **99.88% ✦ best** |
| 4 | 0.0008 | 0.0091 | 99.80% | 99.80% |
| 5 | 0.0007 | 0.0088 | 99.80% | 99.80% |

Best checkpoint selected at epoch 3 (highest validation macro F1). Early stopping was configured with patience=2.

### Baselines

| Method | Accuracy | Macro F1 |
|---|---|---|
| Random | 9.9% | 9.1% |
| Majority class | 22.1% | 3.3% |
| TF-IDF + Logistic Regression | 99.7% | 99.7% |
| **This model (DistilBERT)** | **100.0%** | **100.0%** |

## Training Details

### Dataset

**[Bitext Customer Support LLM Chatbot Training Dataset](https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset)**

- **License:** [CDLA-Sharing-1.0](https://cdla.dev/sharing-1-0/)
- **Publisher:** [Bitext](https://www.bitext.com/)
- **Size:** 26,872 rows → 24,635 after deduplication
- **Splits used:** 80% train (19,708) / 10% validation (2,463) / 10% test (2,464), stratified
- **Language:** English
- **Format:** Synthetic, template-generated customer support messages with intentional typos, case variations, and paraphrasing. Includes `{{placeholder}}` tokens for entities like order numbers and names.

The dataset contains 11 high-level issue categories and 27 fine-grained intents. This model classifies at the **category level**.

#### Class Distribution (Training Set)

| Category | Count | % |
|---|---|---|
| ACCOUNT | 4,354 | 22.1% |
| ORDER | 2,534 | 12.9% |
| REFUND | 2,098 | 10.6% |
| CONTACT | 1,599 | 8.1% |
| PAYMENT | 1,598 | 8.1% |
| FEEDBACK | 1,598 | 8.1% |
| SHIPPING | 1,576 | 8.0% |
| INVOICE | 1,464 | 7.4% |
| DELIVERY | 1,327 | 6.7% |
| SUBSCRIPTION | 799 | 4.1% |
| CANCEL | 760 | 3.9% |

Imbalance ratio: 5.7× (ACCOUNT vs CANCEL). Despite this, macro F1 is perfect — all classes are well-separated in semantic space.

### Preprocessing

1. Exact-duplicate removal (26,872 → 24,635 samples)
2. Stratified train/val/test split (80/10/10, seed=42)
3. Tokenization with `distilbert-base-uncased` tokenizer, `max_length=128`
4. Dynamic padding via `DataCollatorWithPadding`

No text was truncated — the longest tokenized input is 32 tokens.

### Hyperparameters

| Parameter | Value |
|---|---|
| Base model | `distilbert/distilbert-base-uncased` (67M params) |
| Learning rate | 2e-5 |
| Batch size | 32 |
| Epochs | 5 (best checkpoint at epoch 3) |
| Weight decay | 0.01 |
| Warmup steps | 308 (10% of total) |
| LR scheduler | Cosine |
| Early stopping | Patience = 2 (metric: macro F1) |
| Precision | fp32 (trained on CPU) |
| Seed | 42 |

### Framework Versions

- Transformers 5.7.0
- PyTorch 2.11.0
- Datasets 4.8.5
- Tokenizers 0.22.2

## Demo

**[🚀 Try the live Gradio demo](https://huggingface.co/spaces/Janvi17/customer-support-ticket-classifier-demo)** — paste any support ticket and see real-time classification with confidence breakdown.

## Limitations and Risks

### Known Failure Modes

This model was stress-tested with 28 adversarial inputs. Four systematic weaknesses were identified:

#### 1. Negation Blindness 🔴

The model ignores negation. *"Don't refund me, just fix the product"* is classified as REFUND (99.95% confidence). The training data contains no negated intents, and DistilBERT's 6-layer architecture has limited compositional reasoning.

**Mitigation:** Add negated intent examples to training data, or post-process with a negation detector.

#### 2. No Out-of-Distribution Rejection 🔴

The model assigns a label to any input, including gibberish, empty strings, and unrelated text. Examples:

| Input | Prediction | Confidence |
|---|---|---|
| `"asdfghjkl"` | ORDER | 78.1% |
| `""` (empty) | ORDER | 73.7% |
| `"The quick brown fox..."` | CONTACT | 84.9% |

**Mitigation:** Use a confidence threshold (recommended: 0.85) to reject uncertain predictions. See the production usage example above.

#### 3. Heavy Typo Fragility 🟡

While the model handles mild typos well (the training data includes ~34% typo-augmented samples), severely misspelled text can cause misclassification:

| Input | Expected | Predicted | Confidence |
|---|---|---|---|
| *"hwere is my pakage"* | DELIVERY | DELIVERY | 48.1% ⚠️ |
| *"I wnat to spek to a humna"* | CONTACT | ORDER | 81.8% ❌ |

**Mitigation:** Add a spell-correction preprocessing step, or augment training data with heavier typo injection.

#### 4. Single-Label on Multi-Intent Tickets 🟡

Real support tickets often span multiple categories. The model picks one:

| Input | Predicted | Also relevant |
|---|---|---|
| *"Cancel my subscription and give me a refund"* | REFUND | CANCEL, SUBSCRIPTION |
| *"Your delivery is terrible, I want to complain"* | DELIVERY | FEEDBACK |

**Mitigation:** For full coverage, return top-k predictions or switch to multi-label classification.

### Dataset Limitations

- **Synthetic data:** The training set is template-generated, not sourced from real customer interactions. Real-world text may contain slang, code-switching, or domain-specific jargon not represented in training.
- **English only:** The model is trained exclusively on English text.
- **Limited vocabulary:** Some categories have as few as 184 unique words (CANCEL), meaning the model relies heavily on keyword matching rather than deep semantic understanding.
- **Placeholder artifacts:** Training data contains `{{Order Number}}`, `{{Person Name}}`, etc. The model has learned to ignore these, but unusual entity formats in real data could affect performance.

### Bias and Fairness

- The synthetic dataset does not represent any specific demographic or dialect distribution.
- Performance on non-standard English (e.g., AAVE, Indian English, ESL patterns) has not been evaluated.
- The model may perform differently across age groups, regions, or communication styles.

### When NOT to Use This Model

- **Safety-critical routing:** Do not use as the sole decision-maker for urgent or safety-related tickets without human review.
- **Non-English text:** The model will produce unreliable predictions on non-English input.
- **Fine-grained intent classification:** This model classifies into 11 broad categories, not 27 fine-grained intents. If you need intent-level predictions (e.g., distinguishing `cancel_order` from `check_cancellation_fee`), retrain with the `intent` column.

## Citation

If you use this model, please cite the training dataset:

```bibtex
@misc{bitext2023customer,
  title={Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants},
  author={Bitext},
  year={2023},
  publisher={Hugging Face},
  url={https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset}
}
```

## License

This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). The training dataset is licensed under [CDLA-Sharing-1.0](https://cdla.dev/sharing-1-0/).