--- license: mit base_model: microsoft/Phi-3-mini-4k-instruct tags: - phi3 - lora - alpaca - instruction-tuning - fine-tuned datasets: - tatsu-lab/alpaca language: - en pipeline_tag: text-generation --- # Phi-3 Mini 4K Instruct - Alpaca LoRA Fine-tuned This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) using LoRA (Low-Rank Adaptation) on the [Alpaca dataset](https://huggingface.co/datasets/tatsu-lab/alpaca). ## Model Details - **Base Model**: microsoft/Phi-3-mini-4k-instruct - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) - **Dataset**: tatsu-lab/alpaca (52,002 instruction-following examples) - **Training Duration**: ~1.24 hours - **Final Training Loss**: 1.0445 - **Average Training Loss**: 1.0311 ## Training Configuration - **LoRA Rank**: 16 - **LoRA Alpha**: 32 - **LoRA Dropout**: 0.05 - **Target Modules**: qkv_proj, o_proj, gate_proj, up_proj, down_proj - **Learning Rate**: 1e-5 - **Batch Size**: 2 (with gradient accumulation steps: 8) - **Epochs**: 1 - **Precision**: bfloat16 - **Gradient Checkpointing**: Enabled ## Usage ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct", trust_remote_code=True) # Load base model base_model = AutoModelForCausalLM.from_pretrained( "microsoft/Phi-3-mini-4k-instruct", torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) # Load LoRA adapters model = PeftModel.from_pretrained(base_model, "johnlam90/phi3-mini-4k-instruct-alpaca-lora") model.eval() # Format prompt prompt = "Give three tips for staying healthy." formatted_prompt = f'''### Instruction: {prompt} ### Response: ''' # Generate inputs = tokenizer(formatted_prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=200, do_sample=False, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response.split("### Response:")[1].strip()) ``` ## Performance The model has been tested with comprehensive safety measures including: - ✅ NaN clamp protection for stable generation - ✅ Proper bfloat16 precision handling - ✅ Consistent and coherent responses across multiple test prompts - ✅ No numerical instabilities during training or inference ## Training Details This model was fine-tuned with careful attention to: 1. **Data Formatting**: Proper Alpaca instruction/input/output structure 2. **Numerical Stability**: Using bfloat16 precision and conservative hyperparameters 3. **Memory Efficiency**: Gradient checkpointing and optimized batch sizes 4. **Safety Measures**: NaN protection and proper token handling ## License This model is released under the MIT license, following the base model's licensing terms.