--- library_name: transformers license: apache-2.0 base_model: distilbert/distilbert-base-uncased tags: - text-classification - customer-support - intent-classification - distilbert - support-tickets language: - en datasets: - bitext/Bitext-customer-support-llm-chatbot-training-dataset metrics: - accuracy - f1 pipeline_tag: text-classification widget: - text: "I want to cancel my subscription immediately" example_title: Subscription cancellation - text: "Where is my package? I've been waiting for 2 weeks" example_title: Delivery tracking - text: "I need a refund for my last purchase" example_title: Refund request - text: "How do I change my account password?" example_title: Account management - text: "I want to speak to a human agent" example_title: Contact request - text: "Can you send me the invoice for order #12345?" example_title: Invoice request model-index: - name: customer-support-ticket-classifier results: - task: type: text-classification name: Customer Support Issue Classification dataset: name: Bitext Customer Support type: bitext/Bitext-customer-support-llm-chatbot-training-dataset split: test metrics: - type: accuracy value: 1.0 name: Accuracy - type: f1 value: 1.0 name: Macro F1 --- # Customer Support Ticket Classifier A fine-tuned [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) model that classifies customer support tickets into **11 issue categories**. Designed for automatic routing, triage, and analytics of customer inquiries. > **🚀 [Try the live demo →](https://huggingface.co/spaces/Janvi17/customer-support-ticket-classifier-demo)** ## Quick Start ```python from transformers import pipeline classifier = pipeline("text-classification", model="Janvi17/customer-support-ticket-classifier") result = classifier("I want to cancel my subscription immediately") print(result) # [{'label': 'SUBSCRIPTION', 'score': 0.997}] ``` ## Categories The model classifies text into one of 11 customer support issue types: | Label | Description | Example | |---|---|---| | `ACCOUNT` | Account creation, deletion, password, profile changes | *"How do I change my account password?"* | | `CANCEL` | Cancellation fees, policies, contract termination | *"What is the fee for canceling the contract?"* | | `CONTACT` | Reaching customer service, speaking to a human agent | *"I want to speak to a human agent"* | | `DELIVERY` | Delivery options, shipping methods, delivery regions | *"Do you ship to Hungary?"* | | `FEEDBACK` | Reviews, complaints, submitting feedback | *"I'd like to leave a review for your services"* | | `INVOICE` | Viewing, requesting, or locating invoices/bills | *"Can you send me the invoice for order #12345?"* | | `ORDER` | Placing, tracking, modifying, or canceling orders | *"I need help cancelling order #55123"* | | `PAYMENT` | Payment methods, issues, checkout errors | *"I get an error when I try to check out"* | | `REFUND` | Refund requests, refund policy, tracking refunds | *"I need a refund for my last purchase"* | | `SHIPPING` | Shipping address changes, setup, modifications | *"I need to update my shipping address"* | | `SUBSCRIPTION` | Newsletter signup, unsubscribe, subscription management | *"Help me unsubscribe from your newsletter"* | ## Usage Examples ### Basic Classification ```python from transformers import pipeline classifier = pipeline("text-classification", model="Janvi17/customer-support-ticket-classifier") tickets = [ "I want to cancel my subscription immediately", "Where is my package? I've been waiting for 2 weeks", "I need a refund for my last purchase", "How do I change my account password?", "I want to speak to a human agent", "Can you send me the invoice for order #12345?", ] for ticket in tickets: result = classifier(ticket) print(f" [{result[0]['label']:>12s}] (conf: {result[0]['score']:.3f}) {ticket}") ``` Output: ``` [SUBSCRIPTION] (conf: 0.997) I want to cancel my subscription immediately [ DELIVERY] (conf: 0.997) Where is my package? I've been waiting for 2 weeks [ REFUND] (conf: 0.999) I need a refund for my last purchase [ ACCOUNT] (conf: 1.000) How do I change my account password? [ CONTACT] (conf: 0.999) I want to speak to a human agent [ INVOICE] (conf: 0.997) Can you send me the invoice for order #12345? ``` ### Batch Classification with Confidence Scores ```python from transformers import pipeline classifier = pipeline( "text-classification", model="Janvi17/customer-support-ticket-classifier", top_k=3, # return top 3 predictions ) result = classifier("The payment for my subscription failed") for pred in result[0]: print(f" {pred['label']:>14s}: {pred['score']:.4f}") # PAYMENT: 0.9661 # SUBSCRIPTION: 0.0153 # ORDER: 0.0068 ``` ### Using with PyTorch Directly ```python import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification model_name = "Janvi17/customer-support-ticket-classifier" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) text = "I need a refund for my last purchase" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) with torch.no_grad(): logits = model(**inputs).logits probs = torch.softmax(logits, dim=-1) pred_id = probs.argmax().item() print(f"Predicted: {model.config.id2label[pred_id]} ({probs[0][pred_id]:.4f})") # Predicted: REFUND (0.9995) ``` ### Production Usage with Confidence Threshold For production systems, reject low-confidence predictions: ```python from transformers import pipeline classifier = pipeline("text-classification", model="Janvi17/customer-support-ticket-classifier") CONFIDENCE_THRESHOLD = 0.85 def classify_ticket(text: str) -> dict: result = classifier(text)[0] if result["score"] < CONFIDENCE_THRESHOLD: return {"label": "UNKNOWN", "score": result["score"], "routed_to": "human_review"} return {"label": result["label"], "score": result["score"], "routed_to": "auto"} # High-confidence → auto-routed print(classify_ticket("I need a refund")) # {'label': 'REFUND', 'score': 0.999, 'routed_to': 'auto'} # Low-confidence → human review print(classify_ticket("asdfghjkl")) # {'label': 'UNKNOWN', 'score': 0.78, 'routed_to': 'human_review'} ``` ## Evaluation Results ### Held-Out Test Set (2,464 samples) | Metric | Score | |---|---| | **Accuracy** | **100.00%** | | **Macro F1** | **100.00%** | | **Weighted F1** | **100.00%** | ### Per-Class Performance | Category | Precision | Recall | F1-Score | Support | |---|---|---|---|---| | ACCOUNT | 1.0000 | 1.0000 | 1.0000 | 545 | | CANCEL | 1.0000 | 1.0000 | 1.0000 | 95 | | CONTACT | 1.0000 | 1.0000 | 1.0000 | 200 | | DELIVERY | 1.0000 | 1.0000 | 1.0000 | 166 | | FEEDBACK | 1.0000 | 1.0000 | 1.0000 | 199 | | INVOICE | 1.0000 | 1.0000 | 1.0000 | 183 | | ORDER | 1.0000 | 1.0000 | 1.0000 | 317 | | PAYMENT | 1.0000 | 1.0000 | 1.0000 | 200 | | REFUND | 1.0000 | 1.0000 | 1.0000 | 262 | | SHIPPING | 1.0000 | 1.0000 | 1.0000 | 197 | | SUBSCRIPTION | 1.0000 | 1.0000 | 1.0000 | 100 | ### Confusion Matrix Perfect diagonal — zero off-diagonal errors on the held-out test set. ### Training Trajectory | Epoch | Train Loss | Val Loss | Val Accuracy | Val Macro F1 | |---|---|---|---|---| | 1 | 0.0229 | 0.0163 | 99.76% | 99.78% | | 2 | 0.0042 | 0.0106 | 99.68% | 99.68% | | **3** | **0.0024** | **0.0054** | **99.88%** | **99.88% ✦ best** | | 4 | 0.0008 | 0.0091 | 99.80% | 99.80% | | 5 | 0.0007 | 0.0088 | 99.80% | 99.80% | Best checkpoint selected at epoch 3 (highest validation macro F1). Early stopping was configured with patience=2. ### Baselines | Method | Accuracy | Macro F1 | |---|---|---| | Random | 9.9% | 9.1% | | Majority class | 22.1% | 3.3% | | TF-IDF + Logistic Regression | 99.7% | 99.7% | | **This model (DistilBERT)** | **100.0%** | **100.0%** | ## Training Details ### Dataset **[Bitext Customer Support LLM Chatbot Training Dataset](https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset)** - **License:** [CDLA-Sharing-1.0](https://cdla.dev/sharing-1-0/) - **Publisher:** [Bitext](https://www.bitext.com/) - **Size:** 26,872 rows → 24,635 after deduplication - **Splits used:** 80% train (19,708) / 10% validation (2,463) / 10% test (2,464), stratified - **Language:** English - **Format:** Synthetic, template-generated customer support messages with intentional typos, case variations, and paraphrasing. Includes `{{placeholder}}` tokens for entities like order numbers and names. The dataset contains 11 high-level issue categories and 27 fine-grained intents. This model classifies at the **category level**. #### Class Distribution (Training Set) | Category | Count | % | |---|---|---| | ACCOUNT | 4,354 | 22.1% | | ORDER | 2,534 | 12.9% | | REFUND | 2,098 | 10.6% | | CONTACT | 1,599 | 8.1% | | PAYMENT | 1,598 | 8.1% | | FEEDBACK | 1,598 | 8.1% | | SHIPPING | 1,576 | 8.0% | | INVOICE | 1,464 | 7.4% | | DELIVERY | 1,327 | 6.7% | | SUBSCRIPTION | 799 | 4.1% | | CANCEL | 760 | 3.9% | Imbalance ratio: 5.7× (ACCOUNT vs CANCEL). Despite this, macro F1 is perfect — all classes are well-separated in semantic space. ### Preprocessing 1. Exact-duplicate removal (26,872 → 24,635 samples) 2. Stratified train/val/test split (80/10/10, seed=42) 3. Tokenization with `distilbert-base-uncased` tokenizer, `max_length=128` 4. Dynamic padding via `DataCollatorWithPadding` No text was truncated — the longest tokenized input is 32 tokens. ### Hyperparameters | Parameter | Value | |---|---| | Base model | `distilbert/distilbert-base-uncased` (67M params) | | Learning rate | 2e-5 | | Batch size | 32 | | Epochs | 5 (best checkpoint at epoch 3) | | Weight decay | 0.01 | | Warmup steps | 308 (10% of total) | | LR scheduler | Cosine | | Early stopping | Patience = 2 (metric: macro F1) | | Precision | fp32 (trained on CPU) | | Seed | 42 | ### Framework Versions - Transformers 5.7.0 - PyTorch 2.11.0 - Datasets 4.8.5 - Tokenizers 0.22.2 ## Demo **[🚀 Try the live Gradio demo](https://huggingface.co/spaces/Janvi17/customer-support-ticket-classifier-demo)** — paste any support ticket and see real-time classification with confidence breakdown. ## Limitations and Risks ### Known Failure Modes This model was stress-tested with 28 adversarial inputs. Four systematic weaknesses were identified: #### 1. Negation Blindness 🔴 The model ignores negation. *"Don't refund me, just fix the product"* is classified as REFUND (99.95% confidence). The training data contains no negated intents, and DistilBERT's 6-layer architecture has limited compositional reasoning. **Mitigation:** Add negated intent examples to training data, or post-process with a negation detector. #### 2. No Out-of-Distribution Rejection 🔴 The model assigns a label to any input, including gibberish, empty strings, and unrelated text. Examples: | Input | Prediction | Confidence | |---|---|---| | `"asdfghjkl"` | ORDER | 78.1% | | `""` (empty) | ORDER | 73.7% | | `"The quick brown fox..."` | CONTACT | 84.9% | **Mitigation:** Use a confidence threshold (recommended: 0.85) to reject uncertain predictions. See the production usage example above. #### 3. Heavy Typo Fragility 🟡 While the model handles mild typos well (the training data includes ~34% typo-augmented samples), severely misspelled text can cause misclassification: | Input | Expected | Predicted | Confidence | |---|---|---|---| | *"hwere is my pakage"* | DELIVERY | DELIVERY | 48.1% ⚠️ | | *"I wnat to spek to a humna"* | CONTACT | ORDER | 81.8% ❌ | **Mitigation:** Add a spell-correction preprocessing step, or augment training data with heavier typo injection. #### 4. Single-Label on Multi-Intent Tickets 🟡 Real support tickets often span multiple categories. The model picks one: | Input | Predicted | Also relevant | |---|---|---| | *"Cancel my subscription and give me a refund"* | REFUND | CANCEL, SUBSCRIPTION | | *"Your delivery is terrible, I want to complain"* | DELIVERY | FEEDBACK | **Mitigation:** For full coverage, return top-k predictions or switch to multi-label classification. ### Dataset Limitations - **Synthetic data:** The training set is template-generated, not sourced from real customer interactions. Real-world text may contain slang, code-switching, or domain-specific jargon not represented in training. - **English only:** The model is trained exclusively on English text. - **Limited vocabulary:** Some categories have as few as 184 unique words (CANCEL), meaning the model relies heavily on keyword matching rather than deep semantic understanding. - **Placeholder artifacts:** Training data contains `{{Order Number}}`, `{{Person Name}}`, etc. The model has learned to ignore these, but unusual entity formats in real data could affect performance. ### Bias and Fairness - The synthetic dataset does not represent any specific demographic or dialect distribution. - Performance on non-standard English (e.g., AAVE, Indian English, ESL patterns) has not been evaluated. - The model may perform differently across age groups, regions, or communication styles. ### When NOT to Use This Model - **Safety-critical routing:** Do not use as the sole decision-maker for urgent or safety-related tickets without human review. - **Non-English text:** The model will produce unreliable predictions on non-English input. - **Fine-grained intent classification:** This model classifies into 11 broad categories, not 27 fine-grained intents. If you need intent-level predictions (e.g., distinguishing `cancel_order` from `check_cancellation_fee`), retrain with the `intent` column. ## Citation If you use this model, please cite the training dataset: ```bibtex @misc{bitext2023customer, title={Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants}, author={Bitext}, year={2023}, publisher={Hugging Face}, url={https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset} } ``` ## License This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). The training dataset is licensed under [CDLA-Sharing-1.0](https://cdla.dev/sharing-1-0/).