--- language: en tags: - text-classification - distilbert - ticket-classification - pytorch license: mit # Adjust as needed (e.g., apache-2.0, unspecified) datasets: - Defect_ticket_v2 # Custom/private dataset name model_name: DistilBERT Ticket Classifier metrics: - accuracy --- # DistilBERT Ticket Classifier (Distil_Bert_V3) ## Model Overview This is a fine-tuned **DistilBERT** model (`distilbert-base-cased`) designed to classify defect tickets and assign them to the appropriate team based on their text content. It cleaned the ticket data from Defect_ticket_V2.csv by fixing missing values input of ticket **Description**, **Comment**, and **Summary**, and predicts one of 5 team labels, each linked to a team email for automated routing. - **Model Type**: DistilBERT for Sequence Classification - **Framework**: PyTorch - **Repository**: [ZAM-ITI-110/Distil_Bert_V3](https://huggingface.co/ZAM-ITI-110/Distil_Bert_V3) - **License**: MIT (see YAML metadata above) - **Created**: February 2025 - **Creator**: AUNGHLAINGTUN/Student ID6319250G NYP ## Intended Use This model is intended for: - Automating ticket assignment in IT support or defect tracking systems. - Reducing manual triage time by predicting the responsible team based on ticket text. ### Use Case - **Input**: A ticket with fields `Description`, `Comment`, and `Summary` (e.g., "Urgent server crash reported in production"). - **Output**: A team label (0-4) mapped to a team email (e.g., `team1@example.com`). ### Out of Scope - Not designed for multi-label classification or sentiment analysis. - May not generalize well to tickets outside the training domain (e.g., non-technical support tickets). ## Training Data - **Dataset**: `Defect_ticket_v2.csv` (private dataset) - **Size**: Approximately 5,000 samples (70% train: ~3,504, 15% validation: ~750, 15% test: ~750). - **Features**: Combined text from `Description`, `Comment`, and `Summary` columns. - **Labels**: 5 unique team labels (encoded as 0-4), derived from the `Assigned Team` column. - **Preprocessing**: Missing values filled with empty strings; text truncated/padded to 512 tokens. Note: The dataset is not publicly available due to privacy constraints. ## Training Procedure - **Base Model**: `distilbert-base-cased` - **Fine-Tuning**: - Epochs: 5 - Batch Size: 8 - Optimizer: AdamW (learning rate: 3e-5, weight decay: 0.01) - Scheduler: Linear with 10% warmup steps - **Hardware**: Trained on Google Colab with a T4 GPU (~31 seconds/epoch). - **Mixed Precision**: Enabled via PyTorch AMP for efficiency. - **Loss Function**: CrossEntropyLoss ### Training Metrics | Epoch | Train Loss | Validation Loss | Validation Accuracy | |-------|------------|-----------------|---------------------| | 1 | 0.4021 | 0.0038 | 100% | | 2 | 0.0031 | 0.0011 | 100% | | 3 | 0.0013 | 0.0006 | 100% | | 4 | 0.0008 | 0.0004 | 100% | | 5 | 0.0007 | 0.0004 | 100% | - **Test Accuracy**: 100% (on ~750 test samples). ## Evaluation - **Performance**: Achieved 100% accuracy on both validation and test sets, indicating excellent fit to the provided data. - **Caveats**: - Perfect accuracy may suggest an easy classification task, limited dataset diversity, or potential data leakage (e.g., duplicates across splits). - Real-world performance on new, unseen tickets should be validated. ## How to Use - Predicts the appropriate team and email for up to 6 ticket descriptions. - Click 'Predict' for each ticket or then 'Send Tickets' to process for all . ### Installation ```bash pip install transformers torch