# Phi-3 Grown Chat Model (Continual LoRA Adaptation) ![Phi-3 Mini](https://huggingface.co/unsloth/Phi-3-mini-4k-instruct/resolve/main/thumbnail.png) **A custom continual-learning chat model based on Phi-3-mini-4k-instruct** Trained with sequential LoRA adapters to simulate "growing new neuron connections" for each learning phase — **no catastrophic forgetting**! - **Base Model**: [unsloth/Phi-3-mini-4k-instruct](https://huggingface.co/unsloth/Phi-3-mini-4k-instruct) (3.82B parameters) - **Total Effective Size**: ~4.1B parameters (base + ~360M from 3 stacked LoRA adapters) - **Dataset**: [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) – one of the best high-quality multi-turn conversation datasets - **Training Method**: Continual learning via sequential LoRA (adds new trainable connections per phase while freezing previous knowledge) - **Phases**: 1. General Chat 2. Reasoning & Q&A 3. Roleplay & Long Context This model excels at natural conversation, reasoning, creative roleplay, and following instructions. It's efficient (4-bit quantized) and runs fast even on consumer GPUs. ## Quick Start / Inference ### Installation (One-Time Setup) ```bash # Install Unsloth (fastest for Phi-3 + LoRA inference) pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" pip install --no-deps xformers trl peft accelerate bitsandbytes Run Inference (Chat with the Model) from unsloth import FastLanguageModel import torch # Load the model (4-bit for efficiency) model, tokenizer = FastLanguageModel.from_pretrained( "yourusername/phi3-grown-chat", # Replace with your HF repo (or local path: "./phi3-grown-chat-model") dtype = None, # Auto-detect (float16/bf16) load_in_4bit = True, # Saves VRAM ) # Enable fast inference FastLanguageModel.for_inference(model) # Chat loop example while True: user_input = input("You: ") if user_input.lower() in ["exit", "quit"]: break messages = [{"role": "user", "content": user_input}] inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" ).to("cuda") outputs = model.generate( input_ids=inputs, max_new_tokens=512, temperature=0.8, do_sample=True, top_p=0.95, ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) # Extract only assistant response print("Assistant:", response.split("<|assistant|>")[1].strip() if "<|assistant|>" in response else response) Example Prompts to Test "Hello! Tell me a fun fact about space." "Explain quantum computing like I'm 10 years old." "You are a pirate captain. Tell me about your greatest adventure." "Write a Python function to check if a number is prime." Long context: Paste a paragraph and ask questions about it. Training Details (How It Was Built) This model uses continual learning with stacked LoRA adapters: Base model frozen. Each phase adds a new LoRA (r=64, ~119M trainable params per phase). Trained sequentially on split UltraChat_200k (69k examples per phase). Tool: Unsloth + TRL SFTTrainer (2x faster than standard). Quick demo: 60 steps per phase (~30 min total on T4 GPU). For stronger results: Increase max_steps=300-500 per phase. Full training code (Colab-ready) available in the repo files or original notebook. Limitations Short training demo → Good but not SOTA (responses may repeat sometimes). Text-only (no vision/multimodal). English primary (UltraChat is mostly English). How to Improve / Extend Want to grow it more? Add Phase 4: Fine-tune on coding dataset (e.g., add new LoRA for programming). Retrain with higher max_steps or larger r=128 for more connections. Merge LoRAs fully: model.merge_and_unload() for single-file upload. License Same as base Phi-3: Microsoft Research License (permissive for research/commercial). Made with ❤️ by Mark — continual learning experiment! If you use/fork this, star the repo! 🚀 text