Phi-4-Mini-Instruct-customerservice-context-summary

This model is a QLoRA fine-tuned version of microsoft/Phi-4-mini-instruct trained to generate context summaries from multi-turn customer-service conversations in the banking domain.

Model Description

This is a QLoRA (Quantized Low-Rank Adaptation) fine-tuned version of microsoft/Phi-4-mini-instruct optimized for context summarization of multi-turn customer-service conversations. Given a multi-turn banking conversation, the model produces a single coherent, factual paragraph capturing all key information needed for an agent to accurately answer the client's next question.

Base Model: microsoft/Phi-4-mini-instruct
Parameters: ~3.8 billion
Fine-tuning Method: QLoRA (4-bit quantization + LoRA)
Domain: Customer Service (Banking)
Task: Multi-Turn Conversation Context Summarization

Intended Uses & Limitations

Intended Uses

Generating context summaries from multi-turn banking customer-service conversations
Supporting context-summarized QA pipelines as a lightweight, on-premise summarizer
Replacing LLM-based summarization in resource-constrained or privacy-sensitive deployments
Research into SLM capabilities for conversational context compression

Limitations

Primarily trained on banking domain data; may require adaptation for other sectors
Performance based on synthetic data; real-world variability may differ
Maximum sequence length: 1024 tokens
Summaries are intended to be consumed by a downstream QA agent, not as standalone outputs

Training Data

Dataset: Synthetic multi-turn customer-service context summarization dataset
Source: Derived from an original multi-turn customer-service conversation dataset; 50,000 unique conversations randomly selected
Public Dataset: Lakshan2003/customer-support-context-summary-50k

Split	Instances
Train	35,000 (70%)
Validation	5,000 (10%)
Test	10,000 (20%)

Training Procedure

Training Configuration

Framework: Unsloth + Hugging Face Transformers
Fine-tuning Method: QLoRA (4-bit quantization)
Hardware: NVIDIA RTX A100 40GB GPU
Training Time: ~2.5–7 hours

Training Hyperparameters

Max Sequence Length: 1024 tokens
Quantization: 4-bit precision
LoRA Rank (r): 8
LoRA Alpha: 16
LoRA Dropout: 0.1
LoRA Target Modules: Query, key, value, and output projection layers
Epochs: 3
Optimizer: AdamW 8-bit
Learning Rate: 2e-5
Batch Size: 8 per device
Gradient Accumulation Steps: 2 (effective batch size: 16)
LR Scheduler: Cosine
Warmup Ratio: 0.05
Model Selection: Best checkpoint based on lowest validation loss

Prompt Template

ft_prompt = """<|system|>
{instruction}<|end|>
<|user|>
Conversation History:
{history}<|end|>
<|assistant|>
"""
 
EOS_TOKEN = tokenizer.eos_token

System Instruction:

You are a professional conversation summarization assistant.
Goal:
Produce a clear, concise and factual summary of the conversation so far so that
YOU, the same customer service agent handling this client, can accurately answer
their next question.
Include only information explicitly stated:
- Client's issue or request and current status (explicitly mention client and
  agent names if present in the conversation)
- Verification steps completed or pending
- Exact names, account numbers or identifiers, dates, amounts and actions taken
  or agreed
- Commitments, deadlines and follow-up actions
- Current state of the conversation
Exclude:
Greetings, filler dialogue, speculation, assumptions, or invented details.
Style:
Neutral and professional. Vary sentence structure and phrasing to avoid repetition.
Output:
One coherent, detailed paragraph summarizing the conversation context.

Inference Parameters:

generation_config = {
    "max_new_tokens": 256,
    "temperature": 0.7,
    "do_sample": True,
    "top_p": 0.8,
    "top_k": 20,
    "min_p": 0.0,
}

Usage Example

Installation

pip install unsloth transformers peft torch

Loading the Model

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
 
base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-4-mini-instruct",
    device_map="auto",
    torch_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4-mini-instruct")
model = PeftModel.from_pretrained(base_model, "Lakshan2003/Phi-4-mini-instruct-customerservice-context-summary")
 
# Merge adapter (optional, for deployment)
model = model.merge_and_unload()
model.eval()

Inference

instruction = """You are a professional conversation summarization assistant.
Goal:
Produce a clear, concise and factual summary of the conversation so far so that
YOU, the same customer service agent handling this client, can accurately answer
their next question.
Include only information explicitly stated:
- Client's issue or request and current status (explicitly mention client and
  agent names if present in the conversation)
- Verification steps completed or pending
- Exact names, account numbers or identifiers, dates, amounts and actions taken
  or agreed
- Commitments, deadlines and follow-up actions
- Current state of the conversation
Exclude:
Greetings, filler dialogue, speculation, assumptions, or invented details.
Style:
Neutral and professional. Vary sentence structure and phrasing to avoid repetition.
Output:
One coherent, detailed paragraph summarizing the conversation context."""
 
history = """Agent: Thank you for calling Optimal Financial Partners, my name is Almira. How can I help you today?
Client: Hi, I'm Kathrine. I noticed some unexpected charges on my account statement and I'm not sure what they are.
Agent: I'm sorry to hear that, Kathrine. As a valued customer, you have the right to dispute any unauthorized or incorrect charges on your account. I'd be happy to investigate any charges you believe are incorrect.
Client: That's great to know. What if I'm not satisfied with the outcome of the investigation?"""
 
input_text = ft_prompt.format(instruction=instruction, history=history)
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=1024).to(model.device)
 
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.7,
        do_sample=True,
        top_p=0.8,
        top_k=20,
        min_p=0.0,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )
 
input_length = inputs.input_ids.shape[1]
summary = tokenizer.decode(outputs[0][input_length:], skip_special_tokens=True).strip()
print(summary)

Example Output:

Kathrine has contacted Almira from Optimal Financial Partners regarding unexpected charges on her statement and her rights as a consumer. Almira confirmed that as a customer, Kathrine has the right to dispute any unauthorized or incorrect charges. Almira offered to investigate any charges Kathrine believes are incorrect. No specific charges, amounts, or account identifiers have been mentioned, and no verification steps have been completed or are pending at this time. The conversation is currently focused on explaining consumer rights and the process for disputing charges.

Framework Versions

PEFT: 0.14.0
Transformers: 4.47.0
PyTorch: 2.5.1+cu121
Unsloth: Latest (training framework)

Citation

If you use this model, please cite:

@article{cooray2026small,
  title={Can Small Language Models Handle Context-Summarized Multi-Turn Customer-Service QA? A Synthetic Data-Driven Comparative Evaluation},
  author={Cooray, Lakshan and Sumanathilaka, Deshan and Raju, Pattigadapa Venkatesh},
  journal={arXiv preprint arXiv:2602.00665},
  year={2026}
}

Model Card Contact

Author: Lakshan Cooray
Institution: Informatics Institute of Technology, Colombo, Sri Lanka
Email: lakshan.20221470@iit.ac.lk

License

This model inherits the license from the base microsoft/Phi-4-mini-instruct model. Please refer to the respective license agreement.

Ethical Considerations

Model trained on synthetic banking data to preserve privacy
Should be used with human oversight in production environments
Summaries are intended to support downstream QA agents, not for direct client-facing output
May require domain adaptation for non-banking customer service
Performance may vary on real-world data with different distributions

Downloads last month: 23

Dataset used to train Lakshan2003/Phi-4-mini-instruct-customerservice-context-summary

Collection including Lakshan2003/Phi-4-mini-instruct-customerservice-context-summary

Customer Service Context Summarization Fine-tuned Models

Collection

Fine-tuned models for context summarization in multi-turn customer service conversations. • 5 items • Updated Mar 10

Paper for Lakshan2003/Phi-4-mini-instruct-customerservice-context-summary

Can Small Language Models Handle Context-Summarized Multi-Turn Customer-Service QA? A Synthetic Data-Driven Comparative Evaluation

Paper • 2602.00665 • Published Jan 31 • 1