🎫 IT Support Ticket Classifier

A production-grade IT support ticket classification system using sentence-transformers embeddings and Logistic Regression. Classifies tickets into 5 categories with 99.2% weighted F1.

Model Details

Property	Value
Embedding model	sentence-transformers/all-MiniLM-L6-v2
Classifier	LogisticRegression (scikit-learn)
Embedding dimensions	384
F1 Score (weighted)	0.9924
Training samples	1,056
Test samples	264
Total dataset	1,320 IT support tickets
Experiment tracking	MLflow

Label	Description	Training samples
Hardware	Physical device issues	360
Software	Application and OS issues	279
Network	Connectivity and VPN issues	233
Security	Threats, phishing, malware	237
Account	Login, permissions, access	211

How It Works

Input text
    │
    ▼
sentence-transformers (all-MiniLM-L6-v2)
384-dimensional embedding
    │
    ▼
LogisticRegression classifier
    │
    ▼
label + confidence score

Usage

from sentence_transformers import SentenceTransformer
import joblib
import numpy as np

# Load models
encoder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
classifier = joblib.load("classifier.joblib")

def predict(text: str) -> dict:
    embedding = encoder.encode([text])
    label = classifier.predict(embedding)[0]
    proba = classifier.predict_proba(embedding)[0]
    confidence = float(np.max(proba))
    return {"label": label, "confidence": round(confidence, 4)}

# Example
result = predict("My laptop screen is flickering and won't turn on")
print(result)
# {"label": "Hardware", "confidence": 0.971}

Example Predictions

Input	Predicted Label	Confidence
"My laptop screen won't turn on"	Hardware	0.97
"I forgot my password and can't login"	Account	0.96
"VPN keeps dropping every few minutes"	Network	0.94
"Received a suspicious phishing email"	Security	0.98
"Microsoft Office crashes on startup"	Software	0.95

Production API

This model is served via a production FastAPI backend with:

API key authentication
Structured JSON logging
Per-request tracing IDs
/health observability endpoint
Async request handling
Latency tracking middleware
GitHub Actions CI/CD pipeline
Docker containerisation
Streamlit UI for single and batch predictions

Training

from sentence_transformers import SentenceTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
import mlflow

mlflow.set_experiment("ticket-classifier")

with mlflow.start_run():
    encoder = SentenceTransformer("all-MiniLM-L6-v2")
    embeddings = encoder.encode(texts, batch_size=64)
    
    X_train, X_test, y_train, y_test = train_test_split(
        embeddings, labels, test_size=0.2, 
        random_state=42, stratify=labels
    )
    
    clf = LogisticRegression(max_iter=1000, C=1.0)
    clf.fit(X_train, y_train)
    
    f1 = f1_score(y_test, clf.predict(X_test), average="weighted")
    mlflow.log_metric("f1_weighted", f1)
    # F1: 0.9924

Dataset

1,320 synthetic IT support tickets with realistic class imbalance and cross-category ambiguity — deliberately designed to prevent perfect scores by including tickets that overlap between Security/Account and Network/Software categories.

Full Project

GitHub: https://github.com/Akhila854/ticket-classifier
Author: Akhila Arekal Ravi
LinkedIn: https://www.linkedin.com/in/akhila-arekal-ravi-51846b205

Downloads last month: -; Downloads are not tracked for this model. How to track

akhilaarekal
/

ticket-classifier