--- license: mit tags: - causal-lm - instruction-following - loRA - QLoRA - sentiment-analysi - quantized language: en library_name: transformers base_model: microsoft/phi-2 metrics: - accuracy --- # Phi-2 QLoRA Fine-Tuned Model **Model:** `mishrabp/phi2-custom-response-qlora-adapter` **Base Model:** [`microsoft/phi-2`](https://huggingface.co/microsoft/phi-2) **Fine-Tuning Method:** QLoRA (4-bit quantized LoRA) **Task:** Instruction-following / Customer Support Responses --- ## Model Description This repository contains a **Phi-2 language model fine-tuned using QLoRA** on a synthetic dataset of customer support instructions and responses. The fine-tuning uses **4-bit quantized LoRA adapters** for memory-efficient training and can run on GPU or CPU (slower on CPU). The model is designed for **instruction-following tasks** like customer support, FAQs, or other dialog generation tasks. --- ## Training Data The fine-tuning dataset is synthetic, consisting of 3000 instruction-response pairs: **Example:** ```text Instruction: "Customer asks about refund window #1" Response: "Our refund window is 30 days from delivery." ``` Here is the dataset that was used for fine-tunning: https://huggingface.co/datasets/mishrabp/customer-support-responses/resolve/main/train.csv You can replace the dataset with your own CSV/JSON file to train on real-world data. --- ## Intended Use * Generate responses to instructions in customer support scenarios. * Small-scale instruction-following experiments. * Educational or research purposes. --- ## How to Use ### Load the Fine-Tuned Model ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel # ----------------------------- # Load fine-tuned model from HF # ----------------------------- model_name = "mishrabp/phi2-custom-response-qlora-adapter" tokenizer = AutoTokenizer.from_pretrained(model_name) base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2") model = PeftModel.from_pretrained(base_model, model_name) device = "cuda" if torch.cuda.is_available() else "cpu" model.to(device) # ----------------------------- # Sample evaluation dataset # ----------------------------- eval_data = [ {"instruction": "Customer asks about refund window", "reference": "Our refund window is 30 days from delivery."}, {"instruction": "Order arrived late", "reference": "Sorry for the delay. A delivery credit has been applied."}, {"instruction": "Wrong item received", "reference": "We’ll ship the correct item and provide a return label."}, ] # ----------------------------- # Evaluation loop # ----------------------------- for i, example in enumerate(eval_data, 1): prompt = f"### Instruction:\n{example['instruction']}\n\n### Response:" inputs = tokenizer(prompt, return_tensors="pt").to(device) output_ids = model.generate(**inputs, max_new_tokens=50) generated = tokenizer.decode(output_ids[0], skip_special_tokens=True) print(f"Example {i}") print("Instruction:", example["instruction"]) print("Generated Response:", generated.split("### Response:")[-1].strip()) print("Reference Response:", example["reference"]) print("-" * 50) # ----------------------------- # Optional: compute simple token-level accuracy or BLEU # ----------------------------- from nltk.translate.bleu_score import sentence_bleu bleu_scores = [] for example in eval_data: prompt = f"### Instruction:\n{example['instruction']}\n\n### Response:" inputs = tokenizer(prompt, return_tensors="pt").to(device) output_ids = model.generate(**inputs, max_new_tokens=50) generated = tokenizer.decode(output_ids[0], skip_special_tokens=True).split("### Response:")[-1].strip() reference_tokens = example["reference"].split() generated_tokens = generated.split() bleu = sentence_bleu([reference_tokens], generated_tokens) bleu_scores.append(bleu) print("Average BLEU score:", sum(bleu_scores)/len(bleu_scores)) ``` --- ## Training Script The training script performs the following steps: 1. Loads the **Phi-2 base model**. 2. Creates a **synthetic dataset** of instruction-response pairs. 3. Tokenizes and formats the dataset for causal language modeling. 4. Applies a **LoRA adapter**. 5. Trains using **QLoRA** if GPU is available, otherwise full-precision LoRA on CPU. 6. Saves the adapter and tokenizer to `./phi2-qlora`. 7. Pushes the adapter and tokenizer to Hugging Face Hub. ### Requirements ```bash pip install torch transformers peft datasets huggingface_hub python-dotenv ``` --- ## Parameters * `r=8`, `lora_alpha=16`, `lora_dropout=0.05` * `target_modules=["q_proj","v_proj"]` (adjust for different base models) * Learning rate: `2e-4` * Batch si