Mistral-7B IT Support Expert (Distilled)

This model is a domain-specific fine-tune of Mistral-7B-v0.3, optimized for high-volume IT support ticket classification and automated routing. Developed to prove the "Small is the New Big" paradigm, this model demonstrates that parameter efficiency can outperform generalist models like GPT-4o in specialized enterprise tasks.

Key Performance Metrics

  • Accuracy: 94.5% on IT-specific classification (benchmarked against GPT-4o's 91%).
  • Latency: <200ms total response time (Local inference via GGUF).
  • Efficiency: Trained using LoRA, modifying only 0.19% of total parameters.
  • Cost Efficiency: ~90% reduction in TCO compared to cloud-hosted frontier APIs.

Technical Specifications

  • Base Model: mistralai/Mistral-7B-v0.3
  • Fine-Tuning Method: LoRA (Low-Rank Adaptation)
  • Optimization: 4-bit quantization via Unsloth
  • Training Hardware: NVIDIA Tesla T4 GPU
  • Export Format: GGUF (Q4_K_M) for high-speed local deployment

Repository Contents

  • adapter_model.safetensors: The trained LoRA weights (the "Expert Brain")
  • adapter_config.json: Technical configuration for the LoRA adapters (Rank=16, Alpha=16)
  • mistral-7b-v0.3.Q4_K_M.gguf: Quantized file for local deployment in LM Studio or Ollama
  • tokenizer_config.json: Specific tokenization settings for IT-specific vocabulary

Training Data & Prompt Format

This model was fine-tuned using a structured instruction-response format. To achieve the 94.5% accuracy mentioned in the case study, inputs must follow the ### Instruction: and ### Response: template.

Example Synthetic Training Samples:

{
  "instruction": "Ticket: 'I've been locked out of the Qubrica portal after 3 failed login attempts.'", 
  "output": "Category: Account-Access | Priority: P3 | Action: Trigger automated identity verification and password reset."
}
{
  "instruction": "Ticket: 'VPN connection dropped specifically for Mangalore office users during the Qubrica sync.'", 
  "output": "Category: Network-VPN | Priority: P2 | Action: Check LDAP synchronization and regional gateway latency."
}
{
  "instruction": "Ticket: 'Production Database Error: 403 Forbidden on the primary Qubrica document storage cluster.'", 
  "output": "Category: Infrastructure-Critical | Priority: P1 | Action: Escalate to Senior DBA and check AWS S3 bucket permissions."
}

Usage Example To verify results locally using the transformers and peft libraries:


from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

# 1. Setup 4-bit configuration
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4"
)

# 2. Load Base Model and Adapter
base_model_id = "mistralai/Mistral-7B-v0.3"
adapter_id = "rakshath1/it-support-mistral-7b-expert"

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=quantization_config,
    device_map="auto"
)

model = PeftModel.from_pretrained(base_model, adapter_id)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

# 3. Inference
ticket = "### Instruction:\nTicket: 'VPN access denied for user in Mangalore office.'\n\n### Response:\n"
inputs = tokenizer(ticket, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=64)

print(tokenizer.decode(outputs[0], skip_special_tokens=True).split("### Response:\n")[-1])

Reproducibility & Compliance To ensure full reproducibility, I have released the LoRA adapters and a quantized GGUF version of the model. While the original raw training data remains proprietary due to enterprise SLAs, I have provided a synthetic dataset sample in the article. This allows researchers to verify the latency and accuracy claims locally without requiring expensive cloud infrastructure.

Downloads last month
33
GGUF
Model size
7B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for rakshath1/it-support-mistral-7b-expert

Adapter
(346)
this model