| # Phi-3 Grown Chat Model (Continual LoRA Adaptation) | |
|  | |
| **A custom continual-learning chat model based on Phi-3-mini-4k-instruct** | |
| Trained with sequential LoRA adapters to simulate "growing new neuron connections" for each learning phase β **no catastrophic forgetting**! | |
| - **Base Model**: [unsloth/Phi-3-mini-4k-instruct](https://huggingface.co/unsloth/Phi-3-mini-4k-instruct) (3.82B parameters) | |
| - **Total Effective Size**: ~4.1B parameters (base + ~360M from 3 stacked LoRA adapters) | |
| - **Dataset**: [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) β one of the best high-quality multi-turn conversation datasets | |
| - **Training Method**: Continual learning via sequential LoRA (adds new trainable connections per phase while freezing previous knowledge) | |
| - **Phases**: | |
| 1. General Chat | |
| 2. Reasoning & Q&A | |
| 3. Roleplay & Long Context | |
| This model excels at natural conversation, reasoning, creative roleplay, and following instructions. It's efficient (4-bit quantized) and runs fast even on consumer GPUs. | |
| ## Quick Start / Inference | |
| ### Installation (One-Time Setup) | |
| ```bash | |
| # Install Unsloth (fastest for Phi-3 + LoRA inference) | |
| pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" | |
| pip install --no-deps xformers trl peft accelerate bitsandbytes | |
| Run Inference (Chat with the Model) | |
| from unsloth import FastLanguageModel | |
| import torch | |
| # Load the model (4-bit for efficiency) | |
| model, tokenizer = FastLanguageModel.from_pretrained( | |
| "yourusername/phi3-grown-chat", # Replace with your HF repo (or local path: "./phi3-grown-chat-model") | |
| dtype = None, # Auto-detect (float16/bf16) | |
| load_in_4bit = True, # Saves VRAM | |
| ) | |
| # Enable fast inference | |
| FastLanguageModel.for_inference(model) | |
| # Chat loop example | |
| while True: | |
| user_input = input("You: ") | |
| if user_input.lower() in ["exit", "quit"]: | |
| break | |
| messages = [{"role": "user", "content": user_input}] | |
| inputs = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=True, | |
| add_generation_prompt=True, | |
| return_tensors="pt" | |
| ).to("cuda") | |
| outputs = model.generate( | |
| input_ids=inputs, | |
| max_new_tokens=512, | |
| temperature=0.8, | |
| do_sample=True, | |
| top_p=0.95, | |
| ) | |
| response = tokenizer.decode(outputs[0], skip_special_tokens=True) | |
| # Extract only assistant response | |
| print("Assistant:", response.split("<|assistant|>")[1].strip() if "<|assistant|>" in response else response) | |
| Example Prompts to Test | |
| "Hello! Tell me a fun fact about space." | |
| "Explain quantum computing like I'm 10 years old." | |
| "You are a pirate captain. Tell me about your greatest adventure." | |
| "Write a Python function to check if a number is prime." | |
| Long context: Paste a paragraph and ask questions about it. | |
| Training Details (How It Was Built) | |
| This model uses continual learning with stacked LoRA adapters: | |
| Base model frozen. | |
| Each phase adds a new LoRA (r=64, ~119M trainable params per phase). | |
| Trained sequentially on split UltraChat_200k (69k examples per phase). | |
| Tool: Unsloth + TRL SFTTrainer (2x faster than standard). | |
| Quick demo: 60 steps per phase (~30 min total on T4 GPU). | |
| For stronger results: Increase max_steps=300-500 per phase. | |
| Full training code (Colab-ready) available in the repo files or original notebook. | |
| Limitations | |
| Short training demo β Good but not SOTA (responses may repeat sometimes). | |
| Text-only (no vision/multimodal). | |
| English primary (UltraChat is mostly English). | |
| How to Improve / Extend | |
| Want to grow it more? | |
| Add Phase 4: Fine-tune on coding dataset (e.g., add new LoRA for programming). | |
| Retrain with higher max_steps or larger r=128 for more connections. | |
| Merge LoRAs fully: model.merge_and_unload() for single-file upload. | |
| License | |
| Same as base Phi-3: Microsoft Research License (permissive for research/commercial). | |
| Made with β€οΈ by Mark β continual learning experiment! | |
| If you use/fork this, star the repo! π | |
| text | |