---
language: en
tags:
  - text-classification
  - distilbert
  - ticket-classification
  - pytorch
license: mit  # Adjust as needed (e.g., apache-2.0, unspecified)
datasets:
  - Defect_ticket_v2  # Custom/private dataset name
model_name: DistilBERT Ticket Classifier
metrics:
  - accuracy
---

# DistilBERT Ticket Classifier (Distil_Bert_V3)

## Model Overview
This is a fine-tuned **DistilBERT** model (`distilbert-base-cased`) designed to classify defect tickets and assign them to the appropriate team based on their text content. It cleaned the ticket data from Defect_ticket_V2.csv by fixing missing values input of ticket **Description**, **Comment**, and **Summary**, and predicts one of 5 team labels, each linked to a team email for automated routing.

- **Model Type**: DistilBERT for Sequence Classification
- **Framework**: PyTorch
- **Repository**: [ZAM-ITI-110/Distil_Bert_V3](https://huggingface.co/ZAM-ITI-110/Distil_Bert_V3)
- **License**: MIT (see YAML metadata above)
- **Created**: February 2025
- **Creator**: AUNGHLAINGTUN/Student ID6319250G NYP

## Intended Use
This model is intended for:
- Automating ticket assignment in IT support or defect tracking systems.
- Reducing manual triage time by predicting the responsible team based on ticket text.

### Use Case
- **Input**: A ticket with fields `Description`, `Comment`, and `Summary` (e.g., "Urgent server crash reported in production").
- **Output**: A team label (0-4) mapped to a team email (e.g., `team1@example.com`).

### Out of Scope
- Not designed for multi-label classification or sentiment analysis.
- May not generalize well to tickets outside the training domain (e.g., non-technical support tickets).

## Training Data
- **Dataset**: `Defect_ticket_v2.csv` (private dataset)
- **Size**: Approximately 5,000 samples (70% train: ~3,504, 15% validation: ~750, 15% test: ~750).
- **Features**: Combined text from `Description`, `Comment`, and `Summary` columns.
- **Labels**: 5 unique team labels (encoded as 0-4), derived from the `Assigned Team` column.
- **Preprocessing**: Missing values filled with empty strings; text truncated/padded to 512 tokens.

Note: The dataset is not publicly available due to privacy constraints.

## Training Procedure
- **Base Model**: `distilbert-base-cased`
- **Fine-Tuning**: 
  - Epochs: 5
  - Batch Size: 8
  - Optimizer: AdamW (learning rate: 3e-5, weight decay: 0.01)
  - Scheduler: Linear with 10% warmup steps
- **Hardware**: Trained on Google Colab with a T4 GPU (~31 seconds/epoch).
- **Mixed Precision**: Enabled via PyTorch AMP for efficiency.
- **Loss Function**: CrossEntropyLoss

### Training Metrics
| Epoch | Train Loss | Validation Loss | Validation Accuracy |
|-------|------------|-----------------|---------------------|
| 1     | 0.4021     | 0.0038          | 100%                |
| 2     | 0.0031     | 0.0011          | 100%                |
| 3     | 0.0013     | 0.0006          | 100%                |
| 4     | 0.0008     | 0.0004          | 100%                |
| 5     | 0.0007     | 0.0004          | 100%                |

- **Test Accuracy**: 100% (on ~750 test samples).

## Evaluation
- **Performance**: Achieved 100% accuracy on both validation and test sets, indicating excellent fit to the provided data.
- **Caveats**:
  - Perfect accuracy may suggest an easy classification task, limited dataset diversity, or potential data leakage (e.g., duplicates across splits).
  - Real-world performance on new, unseen tickets should be validated.

## How to Use
  - Predicts the appropriate team and email for up to 6 ticket descriptions.
  - Click 'Predict' for each ticket or then 'Send Tickets' to process for all . 
### Installation
```bash
pip install transformers torch