Phi-4-Mini-Instruct-customerservice-context-summary

This model is a QLoRA fine-tuned version of microsoft/Phi-4-mini-instruct trained to generate context summaries from multi-turn customer-service conversations in the banking domain.

Model Description

This is a QLoRA (Quantized Low-Rank Adaptation) fine-tuned version of microsoft/Phi-4-mini-instruct optimized for context summarization of multi-turn customer-service conversations. Given a multi-turn banking conversation, the model produces a single coherent, factual paragraph capturing all key information needed for an agent to accurately answer the client's next question.

Base Model: microsoft/Phi-4-mini-instruct
Parameters: ~3.8 billion
Fine-tuning Method: QLoRA (4-bit quantization + LoRA)
Domain: Customer Service (Banking)
Task: Multi-Turn Conversation Context Summarization

Intended Uses & Limitations

Intended Uses

  • Generating context summaries from multi-turn banking customer-service conversations
  • Supporting context-summarized QA pipelines as a lightweight, on-premise summarizer
  • Replacing LLM-based summarization in resource-constrained or privacy-sensitive deployments
  • Research into SLM capabilities for conversational context compression

Limitations

  • Primarily trained on banking domain data; may require adaptation for other sectors
  • Performance based on synthetic data; real-world variability may differ
  • Maximum sequence length: 1024 tokens
  • Summaries are intended to be consumed by a downstream QA agent, not as standalone outputs

Training Data

Dataset: Synthetic multi-turn customer-service context summarization dataset
Source: Derived from an original multi-turn customer-service conversation dataset; 50,000 unique conversations randomly selected
Public Dataset: Lakshan2003/customer-support-context-summary-50k

Split Instances
Train 35,000 (70%)
Validation 5,000 (10%)
Test 10,000 (20%)

Training Procedure

Training Configuration

  • Framework: Unsloth + Hugging Face Transformers
  • Fine-tuning Method: QLoRA (4-bit quantization)
  • Hardware: NVIDIA RTX A100 40GB GPU
  • Training Time: ~2.5–7 hours

Training Hyperparameters

  • Max Sequence Length: 1024 tokens
  • Quantization: 4-bit precision
  • LoRA Rank (r): 8
  • LoRA Alpha: 16
  • LoRA Dropout: 0.1
  • LoRA Target Modules: Query, key, value, and output projection layers
  • Epochs: 3
  • Optimizer: AdamW 8-bit
  • Learning Rate: 2e-5
  • Batch Size: 8 per device
  • Gradient Accumulation Steps: 2 (effective batch size: 16)
  • LR Scheduler: Cosine
  • Warmup Ratio: 0.05
  • Model Selection: Best checkpoint based on lowest validation loss

Prompt Template

ft_prompt = """<|system|>
{instruction}<|end|>
<|user|>
Conversation History:
{history}<|end|>
<|assistant|>
"""
 
EOS_TOKEN = tokenizer.eos_token

System Instruction:

You are a professional conversation summarization assistant.
Goal:
Produce a clear, concise and factual summary of the conversation so far so that
YOU, the same customer service agent handling this client, can accurately answer
their next question.
Include only information explicitly stated:
- Client's issue or request and current status (explicitly mention client and
  agent names if present in the conversation)
- Verification steps completed or pending
- Exact names, account numbers or identifiers, dates, amounts and actions taken
  or agreed
- Commitments, deadlines and follow-up actions
- Current state of the conversation
Exclude:
Greetings, filler dialogue, speculation, assumptions, or invented details.
Style:
Neutral and professional. Vary sentence structure and phrasing to avoid repetition.
Output:
One coherent, detailed paragraph summarizing the conversation context.

Inference Parameters:

generation_config = {
    "max_new_tokens": 256,
    "temperature": 0.7,
    "do_sample": True,
    "top_p": 0.8,
    "top_k": 20,
    "min_p": 0.0,
}

Usage Example

Installation

pip install unsloth transformers peft torch

Loading the Model

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
 
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-4-mini-instruct",
    device_map="auto",
    torch_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4-mini-instruct")
model = PeftModel.from_pretrained(base_model, "Lakshan2003/Phi-4-mini-instruct-customerservice-context-summary")
 
# Merge adapter (optional, for deployment)
model = model.merge_and_unload()
model.eval()

Inference

instruction = """You are a professional conversation summarization assistant.
Goal:
Produce a clear, concise and factual summary of the conversation so far so that
YOU, the same customer service agent handling this client, can accurately answer
their next question.
Include only information explicitly stated:
- Client's issue or request and current status (explicitly mention client and
  agent names if present in the conversation)
- Verification steps completed or pending
- Exact names, account numbers or identifiers, dates, amounts and actions taken
  or agreed
- Commitments, deadlines and follow-up actions
- Current state of the conversation
Exclude:
Greetings, filler dialogue, speculation, assumptions, or invented details.
Style:
Neutral and professional. Vary sentence structure and phrasing to avoid repetition.
Output:
One coherent, detailed paragraph summarizing the conversation context."""
 
history = """Agent: Thank you for calling Optimal Financial Partners, my name is Almira. How can I help you today?
Client: Hi, I'm Kathrine. I noticed some unexpected charges on my account statement and I'm not sure what they are.
Agent: I'm sorry to hear that, Kathrine. As a valued customer, you have the right to dispute any unauthorized or incorrect charges on your account. I'd be happy to investigate any charges you believe are incorrect.
Client: That's great to know. What if I'm not satisfied with the outcome of the investigation?"""
 
input_text = ft_prompt.format(instruction=instruction, history=history)
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=1024).to(model.device)
 
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.7,
        do_sample=True,
        top_p=0.8,
        top_k=20,
        min_p=0.0,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )
 
input_length = inputs.input_ids.shape[1]
summary = tokenizer.decode(outputs[0][input_length:], skip_special_tokens=True).strip()
print(summary)

Example Output:

Kathrine has contacted Almira from Optimal Financial Partners regarding unexpected charges on her statement and her rights as a consumer. Almira confirmed that as a customer, Kathrine has the right to dispute any unauthorized or incorrect charges. Almira offered to investigate any charges Kathrine believes are incorrect. No specific charges, amounts, or account identifiers have been mentioned, and no verification steps have been completed or are pending at this time. The conversation is currently focused on explaining consumer rights and the process for disputing charges.

Framework Versions

  • PEFT: 0.14.0
  • Transformers: 4.47.0
  • PyTorch: 2.5.1+cu121
  • Unsloth: Latest (training framework)

Citation

If you use this model, please cite:

@article{cooray2026small,
  title={Can Small Language Models Handle Context-Summarized Multi-Turn Customer-Service QA? A Synthetic Data-Driven Comparative Evaluation},
  author={Cooray, Lakshan and Sumanathilaka, Deshan and Raju, Pattigadapa Venkatesh},
  journal={arXiv preprint arXiv:2602.00665},
  year={2026}
}

Model Card Contact

Author: Lakshan Cooray
Institution: Informatics Institute of Technology, Colombo, Sri Lanka
Email: lakshan.20221470@iit.ac.lk

License

This model inherits the license from the base microsoft/Phi-4-mini-instruct model. Please refer to the respective license agreement.

Ethical Considerations

  • Model trained on synthetic banking data to preserve privacy
  • Should be used with human oversight in production environments
  • Summaries are intended to support downstream QA agents, not for direct client-facing output
  • May require domain adaptation for non-banking customer service
  • Performance may vary on real-world data with different distributions
Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Lakshan2003/Phi-4-mini-instruct-customerservice-context-summary

Collection including Lakshan2003/Phi-4-mini-instruct-customerservice-context-summary

Paper for Lakshan2003/Phi-4-mini-instruct-customerservice-context-summary