Add live demo link to model card

c6b6fea verified about 1 month ago

14.5 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: distilbert/distilbert-base-uncased
	tags:
	- text-classification
	- customer-support
	- intent-classification
	- distilbert
	- support-tickets
	language:
	- en
	datasets:
	- bitext/Bitext-customer-support-llm-chatbot-training-dataset
	metrics:
	- accuracy
	- f1
	pipeline_tag: text-classification
	widget:
	- text: "I want to cancel my subscription immediately"
	example_title: Subscription cancellation
	- text: "Where is my package? I've been waiting for 2 weeks"
	example_title: Delivery tracking
	- text: "I need a refund for my last purchase"
	example_title: Refund request
	- text: "How do I change my account password?"
	example_title: Account management
	- text: "I want to speak to a human agent"
	example_title: Contact request
	- text: "Can you send me the invoice for order #12345?"
	example_title: Invoice request
	model-index:
	- name: customer-support-ticket-classifier
	results:
	- task:
	type: text-classification
	name: Customer Support Issue Classification
	dataset:
	name: Bitext Customer Support
	type: bitext/Bitext-customer-support-llm-chatbot-training-dataset
	split: test
	metrics:
	- type: accuracy
	value: 1.0
	name: Accuracy
	- type: f1
	value: 1.0
	name: Macro F1
	---

	# Customer Support Ticket Classifier

	A fine-tuned [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) model that classifies customer support tickets into 11 issue categories. Designed for automatic routing, triage, and analytics of customer inquiries.

	> 🚀 [Try the live demo →](https://huggingface.co/spaces/Janvi17/customer-support-ticket-classifier-demo)

	## Quick Start

	```python
	from transformers import pipeline

	classifier = pipeline("text-classification", model="Janvi17/customer-support-ticket-classifier")

	result = classifier("I want to cancel my subscription immediately")
	print(result)
	# [{'label': 'SUBSCRIPTION', 'score': 0.997}]
	```

	## Categories

	The model classifies text into one of 11 customer support issue types:

	\| Label \| Description \| Example \|
	\|---\|---\|---\|
	\| `ACCOUNT` \| Account creation, deletion, password, profile changes \| "How do I change my account password?" \|
	\| `CANCEL` \| Cancellation fees, policies, contract termination \| "What is the fee for canceling the contract?" \|
	\| `CONTACT` \| Reaching customer service, speaking to a human agent \| "I want to speak to a human agent" \|
	\| `DELIVERY` \| Delivery options, shipping methods, delivery regions \| "Do you ship to Hungary?" \|
	\| `FEEDBACK` \| Reviews, complaints, submitting feedback \| "I'd like to leave a review for your services" \|
	\| `INVOICE` \| Viewing, requesting, or locating invoices/bills \| "Can you send me the invoice for order #12345?" \|
	\| `ORDER` \| Placing, tracking, modifying, or canceling orders \| "I need help cancelling order #55123" \|
	\| `PAYMENT` \| Payment methods, issues, checkout errors \| "I get an error when I try to check out" \|
	\| `REFUND` \| Refund requests, refund policy, tracking refunds \| "I need a refund for my last purchase" \|
	\| `SHIPPING` \| Shipping address changes, setup, modifications \| "I need to update my shipping address" \|
	\| `SUBSCRIPTION` \| Newsletter signup, unsubscribe, subscription management \| "Help me unsubscribe from your newsletter" \|

	## Usage Examples

	### Basic Classification

	```python
	from transformers import pipeline

	classifier = pipeline("text-classification", model="Janvi17/customer-support-ticket-classifier")

	tickets = [
	"I want to cancel my subscription immediately",
	"Where is my package? I've been waiting for 2 weeks",
	"I need a refund for my last purchase",
	"How do I change my account password?",
	"I want to speak to a human agent",
	"Can you send me the invoice for order #12345?",
	]

	for ticket in tickets:
	result = classifier(ticket)
	print(f" [{result[0]['label']:>12s}] (conf: {result[0]['score']:.3f}) {ticket}")
	```

	Output:
	```
	[SUBSCRIPTION] (conf: 0.997) I want to cancel my subscription immediately
	[ DELIVERY] (conf: 0.997) Where is my package? I've been waiting for 2 weeks
	[ REFUND] (conf: 0.999) I need a refund for my last purchase
	[ ACCOUNT] (conf: 1.000) How do I change my account password?
	[ CONTACT] (conf: 0.999) I want to speak to a human agent
	[ INVOICE] (conf: 0.997) Can you send me the invoice for order #12345?
	```

	### Batch Classification with Confidence Scores

	```python
	from transformers import pipeline

	classifier = pipeline(
	"text-classification",
	model="Janvi17/customer-support-ticket-classifier",
	top_k=3, # return top 3 predictions
	)

	result = classifier("The payment for my subscription failed")
	for pred in result[0]:
	print(f" {pred['label']:>14s}: {pred['score']:.4f}")
	# PAYMENT: 0.9661
	# SUBSCRIPTION: 0.0153
	# ORDER: 0.0068
	```

	### Using with PyTorch Directly

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	model_name = "Janvi17/customer-support-ticket-classifier"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	text = "I need a refund for my last purchase"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)

	with torch.no_grad():
	logits = model(**inputs).logits
	probs = torch.softmax(logits, dim=-1)
	pred_id = probs.argmax().item()

	print(f"Predicted: {model.config.id2label[pred_id]} ({probs[0][pred_id]:.4f})")
	# Predicted: REFUND (0.9995)
	```

	### Production Usage with Confidence Threshold

	For production systems, reject low-confidence predictions:

	```python
	from transformers import pipeline

	classifier = pipeline("text-classification", model="Janvi17/customer-support-ticket-classifier")
	CONFIDENCE_THRESHOLD = 0.85

	def classify_ticket(text: str) -> dict:
	result = classifier(text)[0]
	if result["score"] < CONFIDENCE_THRESHOLD:
	return {"label": "UNKNOWN", "score": result["score"], "routed_to": "human_review"}
	return {"label": result["label"], "score": result["score"], "routed_to": "auto"}

	# High-confidence → auto-routed
	print(classify_ticket("I need a refund"))
	# {'label': 'REFUND', 'score': 0.999, 'routed_to': 'auto'}

	# Low-confidence → human review
	print(classify_ticket("asdfghjkl"))
	# {'label': 'UNKNOWN', 'score': 0.78, 'routed_to': 'human_review'}
	```

	## Evaluation Results

	### Held-Out Test Set (2,464 samples)

	\| Metric \| Score \|
	\|---\|---\|
	\| Accuracy \| 100.00% \|
	\| Macro F1 \| 100.00% \|
	\| Weighted F1 \| 100.00% \|

	### Per-Class Performance

	\| Category \| Precision \| Recall \| F1-Score \| Support \|
	\|---\|---\|---\|---\|---\|
	\| ACCOUNT \| 1.0000 \| 1.0000 \| 1.0000 \| 545 \|
	\| CANCEL \| 1.0000 \| 1.0000 \| 1.0000 \| 95 \|
	\| CONTACT \| 1.0000 \| 1.0000 \| 1.0000 \| 200 \|
	\| DELIVERY \| 1.0000 \| 1.0000 \| 1.0000 \| 166 \|
	\| FEEDBACK \| 1.0000 \| 1.0000 \| 1.0000 \| 199 \|
	\| INVOICE \| 1.0000 \| 1.0000 \| 1.0000 \| 183 \|
	\| ORDER \| 1.0000 \| 1.0000 \| 1.0000 \| 317 \|
	\| PAYMENT \| 1.0000 \| 1.0000 \| 1.0000 \| 200 \|
	\| REFUND \| 1.0000 \| 1.0000 \| 1.0000 \| 262 \|
	\| SHIPPING \| 1.0000 \| 1.0000 \| 1.0000 \| 197 \|
	\| SUBSCRIPTION \| 1.0000 \| 1.0000 \| 1.0000 \| 100 \|

	### Confusion Matrix

	Perfect diagonal — zero off-diagonal errors on the held-out test set.

	### Training Trajectory

	\| Epoch \| Train Loss \| Val Loss \| Val Accuracy \| Val Macro F1 \|
	\|---\|---\|---\|---\|---\|
	\| 1 \| 0.0229 \| 0.0163 \| 99.76% \| 99.78% \|
	\| 2 \| 0.0042 \| 0.0106 \| 99.68% \| 99.68% \|
	\| 3 \| 0.0024 \| 0.0054 \| 99.88% \| 99.88% ✦ best \|
	\| 4 \| 0.0008 \| 0.0091 \| 99.80% \| 99.80% \|
	\| 5 \| 0.0007 \| 0.0088 \| 99.80% \| 99.80% \|

	Best checkpoint selected at epoch 3 (highest validation macro F1). Early stopping was configured with patience=2.

	### Baselines

	\| Method \| Accuracy \| Macro F1 \|
	\|---\|---\|---\|
	\| Random \| 9.9% \| 9.1% \|
	\| Majority class \| 22.1% \| 3.3% \|
	\| TF-IDF + Logistic Regression \| 99.7% \| 99.7% \|
	\| This model (DistilBERT) \| 100.0% \| 100.0% \|

	## Training Details

	### Dataset

	[Bitext Customer Support LLM Chatbot Training Dataset](https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset)

	- License: [CDLA-Sharing-1.0](https://cdla.dev/sharing-1-0/)
	- Publisher: [Bitext](https://www.bitext.com/)
	- Size: 26,872 rows → 24,635 after deduplication
	- Splits used: 80% train (19,708) / 10% validation (2,463) / 10% test (2,464), stratified
	- Language: English
	- Format: Synthetic, template-generated customer support messages with intentional typos, case variations, and paraphrasing. Includes `{{placeholder}}` tokens for entities like order numbers and names.

	The dataset contains 11 high-level issue categories and 27 fine-grained intents. This model classifies at the category level.

	#### Class Distribution (Training Set)

	\| Category \| Count \| % \|
	\|---\|---\|---\|
	\| ACCOUNT \| 4,354 \| 22.1% \|
	\| ORDER \| 2,534 \| 12.9% \|
	\| REFUND \| 2,098 \| 10.6% \|
	\| CONTACT \| 1,599 \| 8.1% \|
	\| PAYMENT \| 1,598 \| 8.1% \|
	\| FEEDBACK \| 1,598 \| 8.1% \|
	\| SHIPPING \| 1,576 \| 8.0% \|
	\| INVOICE \| 1,464 \| 7.4% \|
	\| DELIVERY \| 1,327 \| 6.7% \|
	\| SUBSCRIPTION \| 799 \| 4.1% \|
	\| CANCEL \| 760 \| 3.9% \|

	Imbalance ratio: 5.7× (ACCOUNT vs CANCEL). Despite this, macro F1 is perfect — all classes are well-separated in semantic space.

	### Preprocessing

	1. Exact-duplicate removal (26,872 → 24,635 samples)
	2. Stratified train/val/test split (80/10/10, seed=42)
	3. Tokenization with `distilbert-base-uncased` tokenizer, `max_length=128`
	4. Dynamic padding via `DataCollatorWithPadding`

	No text was truncated — the longest tokenized input is 32 tokens.

	### Hyperparameters

	\| Parameter \| Value \|
	\|---\|---\|
	\| Base model \| `distilbert/distilbert-base-uncased` (67M params) \|
	\| Learning rate \| 2e-5 \|
	\| Batch size \| 32 \|
	\| Epochs \| 5 (best checkpoint at epoch 3) \|
	\| Weight decay \| 0.01 \|
	\| Warmup steps \| 308 (10% of total) \|
	\| LR scheduler \| Cosine \|
	\| Early stopping \| Patience = 2 (metric: macro F1) \|
	\| Precision \| fp32 (trained on CPU) \|
	\| Seed \| 42 \|

	### Framework Versions

	- Transformers 5.7.0
	- PyTorch 2.11.0
	- Datasets 4.8.5
	- Tokenizers 0.22.2

	## Demo

	[🚀 Try the live Gradio demo](https://huggingface.co/spaces/Janvi17/customer-support-ticket-classifier-demo) — paste any support ticket and see real-time classification with confidence breakdown.

	## Limitations and Risks

	### Known Failure Modes

	This model was stress-tested with 28 adversarial inputs. Four systematic weaknesses were identified:

	#### 1. Negation Blindness 🔴

	The model ignores negation. "Don't refund me, just fix the product" is classified as REFUND (99.95% confidence). The training data contains no negated intents, and DistilBERT's 6-layer architecture has limited compositional reasoning.

	Mitigation: Add negated intent examples to training data, or post-process with a negation detector.

	#### 2. No Out-of-Distribution Rejection 🔴

	The model assigns a label to any input, including gibberish, empty strings, and unrelated text. Examples:

	\| Input \| Prediction \| Confidence \|
	\|---\|---\|---\|
	\| `"asdfghjkl"` \| ORDER \| 78.1% \|
	\| `""` (empty) \| ORDER \| 73.7% \|
	\| `"The quick brown fox..."` \| CONTACT \| 84.9% \|

	Mitigation: Use a confidence threshold (recommended: 0.85) to reject uncertain predictions. See the production usage example above.

	#### 3. Heavy Typo Fragility 🟡

	While the model handles mild typos well (the training data includes ~34% typo-augmented samples), severely misspelled text can cause misclassification:

	\| Input \| Expected \| Predicted \| Confidence \|
	\|---\|---\|---\|---\|
	\| "hwere is my pakage" \| DELIVERY \| DELIVERY \| 48.1% ⚠️ \|
	\| "I wnat to spek to a humna" \| CONTACT \| ORDER \| 81.8% ❌ \|

	Mitigation: Add a spell-correction preprocessing step, or augment training data with heavier typo injection.

	#### 4. Single-Label on Multi-Intent Tickets 🟡

	Real support tickets often span multiple categories. The model picks one:

	\| Input \| Predicted \| Also relevant \|
	\|---\|---\|---\|
	\| "Cancel my subscription and give me a refund" \| REFUND \| CANCEL, SUBSCRIPTION \|
	\| "Your delivery is terrible, I want to complain" \| DELIVERY \| FEEDBACK \|

	Mitigation: For full coverage, return top-k predictions or switch to multi-label classification.

	### Dataset Limitations

	- Synthetic data: The training set is template-generated, not sourced from real customer interactions. Real-world text may contain slang, code-switching, or domain-specific jargon not represented in training.
	- English only: The model is trained exclusively on English text.
	- Limited vocabulary: Some categories have as few as 184 unique words (CANCEL), meaning the model relies heavily on keyword matching rather than deep semantic understanding.
	- Placeholder artifacts: Training data contains `{{Order Number}}`, `{{Person Name}}`, etc. The model has learned to ignore these, but unusual entity formats in real data could affect performance.

	### Bias and Fairness

	- The synthetic dataset does not represent any specific demographic or dialect distribution.
	- Performance on non-standard English (e.g., AAVE, Indian English, ESL patterns) has not been evaluated.
	- The model may perform differently across age groups, regions, or communication styles.

	### When NOT to Use This Model

	- Safety-critical routing: Do not use as the sole decision-maker for urgent or safety-related tickets without human review.
	- Non-English text: The model will produce unreliable predictions on non-English input.
	- Fine-grained intent classification: This model classifies into 11 broad categories, not 27 fine-grained intents. If you need intent-level predictions (e.g., distinguishing `cancel_order` from `check_cancellation_fee`), retrain with the `intent` column.

	## Citation

	If you use this model, please cite the training dataset:

	```bibtex
	@misc{bitext2023customer,
	title={Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants},
	author={Bitext},
	year={2023},
	publisher={Hugging Face},
	url={https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset}
	}
	```

	## License

	This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). The training dataset is licensed under [CDLA-Sharing-1.0](https://cdla.dev/sharing-1-0/).