|
|
--- |
|
|
language: en |
|
|
tags: |
|
|
- text-classification |
|
|
- distilbert |
|
|
- ticket-classification |
|
|
- pytorch |
|
|
license: mit |
|
|
datasets: |
|
|
- Defect_ticket_v2 |
|
|
model_name: DistilBERT Ticket Classifier |
|
|
metrics: |
|
|
- accuracy |
|
|
--- |
|
|
|
|
|
# DistilBERT Ticket Classifier (Distil_Bert_V3) |
|
|
|
|
|
## Model Overview |
|
|
This is a fine-tuned **DistilBERT** model (`distilbert-base-cased`) designed to classify defect tickets and assign them to the appropriate team based on their text content. It cleaned the ticket data from Defect_ticket_V2.csv by fixing missing values input of ticket **Description**, **Comment**, and **Summary**, and predicts one of 5 team labels, each linked to a team email for automated routing. |
|
|
|
|
|
- **Model Type**: DistilBERT for Sequence Classification |
|
|
- **Framework**: PyTorch |
|
|
- **Repository**: [ZAM-ITI-110/Distil_Bert_V3](https://huggingface.co/ZAM-ITI-110/Distil_Bert_V3) |
|
|
- **License**: MIT (see YAML metadata above) |
|
|
- **Created**: February 2025 |
|
|
- **Creator**: AUNGHLAINGTUN/Student ID6319250G NYP |
|
|
|
|
|
## Intended Use |
|
|
This model is intended for: |
|
|
- Automating ticket assignment in IT support or defect tracking systems. |
|
|
- Reducing manual triage time by predicting the responsible team based on ticket text. |
|
|
|
|
|
### Use Case |
|
|
- **Input**: A ticket with fields `Description`, `Comment`, and `Summary` (e.g., "Urgent server crash reported in production"). |
|
|
- **Output**: A team label (0-4) mapped to a team email (e.g., `team1@example.com`). |
|
|
|
|
|
### Out of Scope |
|
|
- Not designed for multi-label classification or sentiment analysis. |
|
|
- May not generalize well to tickets outside the training domain (e.g., non-technical support tickets). |
|
|
|
|
|
## Training Data |
|
|
- **Dataset**: `Defect_ticket_v2.csv` (private dataset) |
|
|
- **Size**: Approximately 5,000 samples (70% train: ~3,504, 15% validation: ~750, 15% test: ~750). |
|
|
- **Features**: Combined text from `Description`, `Comment`, and `Summary` columns. |
|
|
- **Labels**: 5 unique team labels (encoded as 0-4), derived from the `Assigned Team` column. |
|
|
- **Preprocessing**: Missing values filled with empty strings; text truncated/padded to 512 tokens. |
|
|
|
|
|
Note: The dataset is not publicly available due to privacy constraints. |
|
|
|
|
|
## Training Procedure |
|
|
- **Base Model**: `distilbert-base-cased` |
|
|
- **Fine-Tuning**: |
|
|
- Epochs: 5 |
|
|
- Batch Size: 8 |
|
|
- Optimizer: AdamW (learning rate: 3e-5, weight decay: 0.01) |
|
|
- Scheduler: Linear with 10% warmup steps |
|
|
- **Hardware**: Trained on Google Colab with a T4 GPU (~31 seconds/epoch). |
|
|
- **Mixed Precision**: Enabled via PyTorch AMP for efficiency. |
|
|
- **Loss Function**: CrossEntropyLoss |
|
|
|
|
|
### Training Metrics |
|
|
| Epoch | Train Loss | Validation Loss | Validation Accuracy | |
|
|
|-------|------------|-----------------|---------------------| |
|
|
| 1 | 0.4021 | 0.0038 | 100% | |
|
|
| 2 | 0.0031 | 0.0011 | 100% | |
|
|
| 3 | 0.0013 | 0.0006 | 100% | |
|
|
| 4 | 0.0008 | 0.0004 | 100% | |
|
|
| 5 | 0.0007 | 0.0004 | 100% | |
|
|
|
|
|
- **Test Accuracy**: 100% (on ~750 test samples). |
|
|
|
|
|
## Evaluation |
|
|
- **Performance**: Achieved 100% accuracy on both validation and test sets, indicating excellent fit to the provided data. |
|
|
- **Caveats**: |
|
|
- Perfect accuracy may suggest an easy classification task, limited dataset diversity, or potential data leakage (e.g., duplicates across splits). |
|
|
- Real-world performance on new, unseen tickets should be validated. |
|
|
|
|
|
## How to Use |
|
|
- Predicts the appropriate team and email for up to 6 ticket descriptions. |
|
|
- Click 'Predict' for each ticket or then 'Send Tickets' to process for all . |
|
|
### Installation |
|
|
```bash |
|
|
pip install transformers torch |