YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
Phi-3 Grown Chat Model (Continual LoRA Adaptation)
A custom continual-learning chat model based on Phi-3-mini-4k-instruct
Trained with sequential LoRA adapters to simulate "growing new neuron connections" for each learning phase โ no catastrophic forgetting!
- Base Model: unsloth/Phi-3-mini-4k-instruct (3.82B parameters)
- Total Effective Size: ~4.1B parameters (base + ~360M from 3 stacked LoRA adapters)
- Dataset: HuggingFaceH4/ultrachat_200k โ one of the best high-quality multi-turn conversation datasets
- Training Method: Continual learning via sequential LoRA (adds new trainable connections per phase while freezing previous knowledge)
- Phases:
- General Chat
- Reasoning & Q&A
- Roleplay & Long Context
This model excels at natural conversation, reasoning, creative roleplay, and following instructions. It's efficient (4-bit quantized) and runs fast even on consumer GPUs.
Quick Start / Inference
Installation (One-Time Setup)
# Install Unsloth (fastest for Phi-3 + LoRA inference)
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps xformers trl peft accelerate bitsandbytes
Run Inference (Chat with the Model)
from unsloth import FastLanguageModel
import torch
# Load the model (4-bit for efficiency)
model, tokenizer = FastLanguageModel.from_pretrained(
"yourusername/phi3-grown-chat", # Replace with your HF repo (or local path: "./phi3-grown-chat-model")
dtype = None, # Auto-detect (float16/bf16)
load_in_4bit = True, # Saves VRAM
)
# Enable fast inference
FastLanguageModel.for_inference(model)
# Chat loop example
while True:
user_input = input("You: ")
if user_input.lower() in ["exit", "quit"]:
break
messages = [{"role": "user", "content": user_input}]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to("cuda")
outputs = model.generate(
input_ids=inputs,
max_new_tokens=512,
temperature=0.8,
do_sample=True,
top_p=0.95,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Extract only assistant response
print("Assistant:", response.split("<|assistant|>")[1].strip() if "<|assistant|>" in response else response)
Example Prompts to Test
"Hello! Tell me a fun fact about space."
"Explain quantum computing like I'm 10 years old."
"You are a pirate captain. Tell me about your greatest adventure."
"Write a Python function to check if a number is prime."
Long context: Paste a paragraph and ask questions about it.
Training Details (How It Was Built)
This model uses continual learning with stacked LoRA adapters:
Base model frozen.
Each phase adds a new LoRA (r=64, ~119M trainable params per phase).
Trained sequentially on split UltraChat_200k (69k examples per phase).
Tool: Unsloth + TRL SFTTrainer (2x faster than standard).
Quick demo: 60 steps per phase (~30 min total on T4 GPU).
For stronger results: Increase max_steps=300-500 per phase.
Full training code (Colab-ready) available in the repo files or original notebook.
Limitations
Short training demo โ Good but not SOTA (responses may repeat sometimes).
Text-only (no vision/multimodal).
English primary (UltraChat is mostly English).
How to Improve / Extend
Want to grow it more?
Add Phase 4: Fine-tune on coding dataset (e.g., add new LoRA for programming).
Retrain with higher max_steps or larger r=128 for more connections.
Merge LoRAs fully: model.merge_and_unload() for single-file upload.
License
Same as base Phi-3: Microsoft Research License (permissive for research/commercial).
Made with โค๏ธ by Mark โ continual learning experiment!
If you use/fork this, star the repo! ๐
text
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support