---
library_name: peft
base_model: mistralai/Mistral-7B-Instruct-v0.3
tags:
  - lora
  - qlora
  - mistral
  - json
  - structured-output
  - customer-support
  - text-generation
license: apache-2.0
datasets:
  - custom
pipeline_tag: text-generation
---

# Mistral 7B — JSON Support Ticket Classifier (QLoRA Adapter)

A QLoRA fine-tuned adapter for **Mistral 7B Instruct v0.3** that converts free-text customer support messages into structured JSON with intent classification, priority assignment, entity extraction, and clarification detection.

## What It Does

Given a customer message like:

> *"Hi, I want a refund because my wireless earbuds are defective. Order id: ORD-39256"*

The model outputs:

```json
{
  "intent": "refund",
  "priority": "high",
  "entities": {
    "order_id": "ORD-39256",
    "product": "wireless earbuds"
  },
  "needs_clarification": false,
  "clarifying_question": null
}
```

When information is missing, it knows to ask:

```json
{
  "intent": "shipping",
  "priority": "medium",
  "entities": {
    "order_id": null,
    "product": null
  },
  "needs_clarification": true,
  "clarifying_question": "Can you share your order ID and the delivery address ZIP code so I can check the shipment status?"
}
```

## Output Schema

| Field | Type | Description |
|-------|------|-------------|
| `intent` | string | One of: `refund`, `cancel`, `shipping`, `exchange`, `complaint`, `inquiry` |
| `priority` | string | `low`, `medium`, or `high` |
| `entities.order_id` | string \| null | Extracted order ID if present |
| `entities.product` | string \| null | Extracted product name if present |
| `needs_clarification` | boolean | Whether the model needs more info to proceed |
| `clarifying_question` | string \| null | Follow-up question if clarification is needed |

## Usage

### Load and Run Inference

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# Load base model in 4-bit
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.3",
    quantization_config=bnb_config,
    device_map="auto",
)

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")
tokenizer.pad_token = tokenizer.eos_token

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "aashnakunk/mistral-7b-json-support")
model.eval()

# Build prompt
system = """You are a support automation assistant.
Return ONLY a single JSON object that matches this schema exactly, with these keys in this order:
1) intent
2) priority
3) entities (with keys: order_id, product)
4) needs_clarification
5) clarifying_question"""

user_message = "Hi, I want a refund because my wireless earbuds are defective. Order id: ORD-39256"

prompt = f"<s>[INST] {system}\n\n{user_message} [/INST]"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
        repetition_penalty=1.2,
        pad_token_id=tokenizer.eos_token_id,
    )

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
```

### Important: Inference Settings

- Set `model.config.use_cache = True` for generation (it's disabled during training)
- Call `model.eval()` to disable dropout
- Use `do_sample=False` for deterministic JSON output
- `repetition_penalty=1.2` helps prevent degenerate repetition

## Training Details

| Parameter | Value |
|-----------|-------|
| **Base model** | `mistralai/Mistral-7B-Instruct-v0.3` |
| **Method** | QLoRA (4-bit quantization + LoRA adapters) |
| **LoRA rank (r)** | 16 |
| **LoRA alpha** | 32 |
| **LoRA dropout** | 0.05 |
| **Target modules** | `q_proj`, `k_proj`, `v_proj`, `o_proj` |
| **Trainable parameters** | 13.6M / 3.77B (0.36%) |
| **Training examples** | 6,000 |
| **Epochs** | 1 |
| **Batch size** | 1 (with gradient accumulation = 8, effective batch = 8) |
| **Learning rate** | 2e-4 |
| **Optimizer** | `paged_adamw_8bit` |
| **Precision** | fp16 mixed precision |
| **Hardware** | NVIDIA Tesla T4 (15 GB) |
| **Training time** | ~2.75 hours |
| **Final training loss** | 0.109 |

### Loss Curve

Training converged smoothly over 750 steps:

- **Step 10:** 1.088 (learning JSON structure)
- **Step 30:** 0.203 (rapid improvement)
- **Step 100:** 0.127 (stabilizing)
- **Step 750:** 0.109 (converged)

## Adapter Size

~50 MB — only the LoRA adapter weights are stored, not the full 7B model.

## Limitations

- Designed specifically for customer support ticket classification — may not generalize to other JSON extraction tasks without further fine-tuning
- Relies on the system prompt format shown above for best results
- Entity extraction is limited to `order_id` and `product` fields
- Trained on synthetic support data — real-world edge cases may need additional examples

## Framework Versions

- **Transformers**: 4.x
- **PEFT**: latest
- **PyTorch**: 2.9.0+cu128
- **BitsAndBytes**: latest
- **TRL**: latest