🎫 IT Support Ticket Classifier

A production-grade IT support ticket classification system using sentence-transformers embeddings and Logistic Regression. Classifies tickets into 5 categories with 99.2% weighted F1.

Model Details

Property Value
Embedding model sentence-transformers/all-MiniLM-L6-v2
Classifier LogisticRegression (scikit-learn)
Embedding dimensions 384
F1 Score (weighted) 0.9924
Training samples 1,056
Test samples 264
Total dataset 1,320 IT support tickets
Experiment tracking MLflow

Categories

Label Description Training samples
Hardware Physical device issues 360
Software Application and OS issues 279
Network Connectivity and VPN issues 233
Security Threats, phishing, malware 237
Account Login, permissions, access 211

How It Works

Input text
    β”‚
    β–Ό
sentence-transformers (all-MiniLM-L6-v2)
384-dimensional embedding
    β”‚
    β–Ό
LogisticRegression classifier
    β”‚
    β–Ό
label + confidence score

Usage

from sentence_transformers import SentenceTransformer
import joblib
import numpy as np

# Load models
encoder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
classifier = joblib.load("classifier.joblib")

def predict(text: str) -> dict:
    embedding = encoder.encode([text])
    label = classifier.predict(embedding)[0]
    proba = classifier.predict_proba(embedding)[0]
    confidence = float(np.max(proba))
    return {"label": label, "confidence": round(confidence, 4)}

# Example
result = predict("My laptop screen is flickering and won't turn on")
print(result)
# {"label": "Hardware", "confidence": 0.971}

Example Predictions

Input Predicted Label Confidence
"My laptop screen won't turn on" Hardware 0.97
"I forgot my password and can't login" Account 0.96
"VPN keeps dropping every few minutes" Network 0.94
"Received a suspicious phishing email" Security 0.98
"Microsoft Office crashes on startup" Software 0.95

Production API

This model is served via a production FastAPI backend with:

  • API key authentication
  • Structured JSON logging
  • Per-request tracing IDs
  • /health observability endpoint
  • Async request handling
  • Latency tracking middleware
  • GitHub Actions CI/CD pipeline
  • Docker containerisation
  • Streamlit UI for single and batch predictions

Training

from sentence_transformers import SentenceTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
import mlflow

mlflow.set_experiment("ticket-classifier")

with mlflow.start_run():
    encoder = SentenceTransformer("all-MiniLM-L6-v2")
    embeddings = encoder.encode(texts, batch_size=64)
    
    X_train, X_test, y_train, y_test = train_test_split(
        embeddings, labels, test_size=0.2, 
        random_state=42, stratify=labels
    )
    
    clf = LogisticRegression(max_iter=1000, C=1.0)
    clf.fit(X_train, y_train)
    
    f1 = f1_score(y_test, clf.predict(X_test), average="weighted")
    mlflow.log_metric("f1_weighted", f1)
    # F1: 0.9924

Dataset

1,320 synthetic IT support tickets with realistic class imbalance and cross-category ambiguity β€” deliberately designed to prevent perfect scores by including tickets that overlap between Security/Account and Network/Software categories.

Full Project

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support