--- library_name: peft base_model: mistralai/Mistral-7B-Instruct-v0.3 tags: - lora - qlora - mistral - json - structured-output - customer-support - text-generation license: apache-2.0 datasets: - custom pipeline_tag: text-generation --- # Mistral 7B — JSON Support Ticket Classifier (QLoRA Adapter) A QLoRA fine-tuned adapter for **Mistral 7B Instruct v0.3** that converts free-text customer support messages into structured JSON with intent classification, priority assignment, entity extraction, and clarification detection. ## What It Does Given a customer message like: > *"Hi, I want a refund because my wireless earbuds are defective. Order id: ORD-39256"* The model outputs: ```json { "intent": "refund", "priority": "high", "entities": { "order_id": "ORD-39256", "product": "wireless earbuds" }, "needs_clarification": false, "clarifying_question": null } ``` When information is missing, it knows to ask: ```json { "intent": "shipping", "priority": "medium", "entities": { "order_id": null, "product": null }, "needs_clarification": true, "clarifying_question": "Can you share your order ID and the delivery address ZIP code so I can check the shipment status?" } ``` ## Output Schema | Field | Type | Description | |-------|------|-------------| | `intent` | string | One of: `refund`, `cancel`, `shipping`, `exchange`, `complaint`, `inquiry` | | `priority` | string | `low`, `medium`, or `high` | | `entities.order_id` | string \| null | Extracted order ID if present | | `entities.product` | string \| null | Extracted product name if present | | `needs_clarification` | boolean | Whether the model needs more info to proceed | | `clarifying_question` | string \| null | Follow-up question if clarification is needed | ## Usage ### Load and Run Inference ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel # Load base model in 4-bit bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, ) base_model = AutoModelForCausalLM.from_pretrained( "mistralai/Mistral-7B-Instruct-v0.3", quantization_config=bnb_config, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3") tokenizer.pad_token = tokenizer.eos_token # Load LoRA adapter model = PeftModel.from_pretrained(base_model, "aashnakunk/mistral-7b-json-support") model.eval() # Build prompt system = """You are a support automation assistant. Return ONLY a single JSON object that matches this schema exactly, with these keys in this order: 1) intent 2) priority 3) entities (with keys: order_id, product) 4) needs_clarification 5) clarifying_question""" user_message = "Hi, I want a refund because my wireless earbuds are defective. Order id: ORD-39256" prompt = f"[INST] {system}\n\n{user_message} [/INST]" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=256, do_sample=False, repetition_penalty=1.2, pad_token_id=tokenizer.eos_token_id, ) response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) print(response) ``` ### Important: Inference Settings - Set `model.config.use_cache = True` for generation (it's disabled during training) - Call `model.eval()` to disable dropout - Use `do_sample=False` for deterministic JSON output - `repetition_penalty=1.2` helps prevent degenerate repetition ## Training Details | Parameter | Value | |-----------|-------| | **Base model** | `mistralai/Mistral-7B-Instruct-v0.3` | | **Method** | QLoRA (4-bit quantization + LoRA adapters) | | **LoRA rank (r)** | 16 | | **LoRA alpha** | 32 | | **LoRA dropout** | 0.05 | | **Target modules** | `q_proj`, `k_proj`, `v_proj`, `o_proj` | | **Trainable parameters** | 13.6M / 3.77B (0.36%) | | **Training examples** | 6,000 | | **Epochs** | 1 | | **Batch size** | 1 (with gradient accumulation = 8, effective batch = 8) | | **Learning rate** | 2e-4 | | **Optimizer** | `paged_adamw_8bit` | | **Precision** | fp16 mixed precision | | **Hardware** | NVIDIA Tesla T4 (15 GB) | | **Training time** | ~2.75 hours | | **Final training loss** | 0.109 | ### Loss Curve Training converged smoothly over 750 steps: - **Step 10:** 1.088 (learning JSON structure) - **Step 30:** 0.203 (rapid improvement) - **Step 100:** 0.127 (stabilizing) - **Step 750:** 0.109 (converged) ## Adapter Size ~50 MB — only the LoRA adapter weights are stored, not the full 7B model. ## Limitations - Designed specifically for customer support ticket classification — may not generalize to other JSON extraction tasks without further fine-tuning - Relies on the system prompt format shown above for best results - Entity extraction is limited to `order_id` and `product` fields - Trained on synthetic support data — real-world edge cases may need additional examples ## Framework Versions - **Transformers**: 4.x - **PEFT**: latest - **PyTorch**: 2.9.0+cu128 - **BitsAndBytes**: latest - **TRL**: latest