---
language:
- en
license: apache-2.0
tags:
- text-classification
- customer-support
- distilbert
- transformers
- mlops
datasets:
- thoughtvector/customer-support-on-twitter
metrics:
- accuracy
- f1
model-index:
- name: ticket-classifier
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      name: Customer Support on Twitter
      type: thoughtvector/customer-support-on-twitter
    metrics:
    - type: accuracy
      value: 0.99
      name: Test Accuracy
    - type: f1
      value: 0.989
      name: Macro F1
---

# Customer Support Ticket Classifier

Fine-tuned **DistilBERT** model for classifying customer support tickets into 5 categories.

## Model Description

This model is a fine-tuned version of `distilbert-base-uncased` trained on real customer support tweets from the [Customer Support on Twitter](https://www.kaggle.com/datasets/thoughtvector/customer-support-on-twitter) dataset.

Developed as part of the **MLDLOps Course Project** at IIT Rajasthan by Abhimanyu Gupta (B22BB001).

## Labels

| ID | Label |
|----|-------|
| 0 | Billing inquiry |
| 1 | Cancellation request |
| 2 | Product inquiry |
| 3 | Refund request |
| 4 | Technical issue |

## Performance

| Metric | Value |
|--------|-------|
| Test Accuracy | **99.0%** |
| Macro F1 | **0.989** |
| Training Time | ~4.5 min (T4 GPU) |
| Inference Latency | ~60ms (CPU) |

## Usage

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="abhimanyu345/ticket-classifier"
)

result = classifier("I was charged twice for my subscription this month")
print(result)
# [{'label': 'Billing inquiry', 'score': 0.9996}]
```

## Training Details

- **Base model:** distilbert-base-uncased
- **Learning rate:** 3e-5
- **Batch size:** 32
- **Epochs:** 4
- **Max sequence length:** 128
- **Training platform:** Google Colab T4 GPU
- **Experiment tracking:** [WandB Project](https://api.wandb.ai/links/abhimanyu001-prom-iit-rajasthan/yttp7n7v)

## Dataset

- **Source:** Twitter Customer Support dataset (2.8M tweets)
- **After filtering:** 658,787 labeled examples
- **After balancing:** 25,000 examples (5,000 per class)
- **Split:** 70% train / 15% val / 15% test

## MLOps Pipeline

Full production pipeline including:

- **DVC** — data versioning
- **WandB** — experiment tracking
- **FastAPI** — model serving
- **Docker** — containerization
- **Prometheus** — metrics monitoring
- **Evidently AI** — drift detection
- **GitHub Actions** — CI/CD

**GitHub Repository:** https://github.com/abhimanyu345/ticket-classifier

## Citation

```bibtex
@misc{gupta2026ticketclassifier,
  author = {Abhimanyu Gupta},
  title = {Customer Support Ticket Classifier with MLOps Pipeline},
  year = {2026},
  publisher = {HuggingFace},
  journal = {HuggingFace Model Hub},
  howpublished = {\url{https://huggingface.co/abhimanyu345/ticket-classifier}}
}
```