Phi-3 Grown Chat Model (Continual LoRA Adaptation)

A custom continual-learning chat model based on Phi-3-mini-4k-instruct
Trained with sequential LoRA adapters to simulate "growing new neuron connections" for each learning phase — no catastrophic forgetting!

Base Model: unsloth/Phi-3-mini-4k-instruct (3.82B parameters)
Total Effective Size: ~4.1B parameters (base + ~360M from 3 stacked LoRA adapters)
Dataset: HuggingFaceH4/ultrachat_200k – one of the best high-quality multi-turn conversation datasets
Training Method: Continual learning via sequential LoRA (adds new trainable connections per phase while freezing previous knowledge)
Phases:
1. General Chat
2. Reasoning & Q&A
3. Roleplay & Long Context

This model excels at natural conversation, reasoning, creative roleplay, and following instructions. It's efficient (4-bit quantized) and runs fast even on consumer GPUs.

Quick Start / Inference

Installation (One-Time Setup)

# Install Unsloth (fastest for Phi-3 + LoRA inference)
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps xformers trl peft accelerate bitsandbytes









Run Inference (Chat with the Model)

from unsloth import FastLanguageModel
import torch

# Load the model (4-bit for efficiency)
model, tokenizer = FastLanguageModel.from_pretrained(
    "yourusername/phi3-grown-chat",  # Replace with your HF repo (or local path: "./phi3-grown-chat-model")
    dtype = None,                    # Auto-detect (float16/bf16)
    load_in_4bit = True,             # Saves VRAM
)

# Enable fast inference
FastLanguageModel.for_inference(model)

# Chat loop example
while True:
    user_input = input("You: ")
    if user_input.lower() in ["exit", "quit"]: 
        break
    
    messages = [{"role": "user", "content": user_input}]
    inputs = tokenizer.apply_chat_template(
        messages, 
        tokenize=True, 
        add_generation_prompt=True, 
        return_tensors="pt"
    ).to("cuda")
    
    outputs = model.generate(
        input_ids=inputs,
        max_new_tokens=512,
        temperature=0.8,
        do_sample=True,
        top_p=0.95,
    )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Extract only assistant response
    print("Assistant:", response.split("<|assistant|>")[1].strip() if "<|assistant|>" in response else response)





Example Prompts to Test

"Hello! Tell me a fun fact about space."
"Explain quantum computing like I'm 10 years old."
"You are a pirate captain. Tell me about your greatest adventure."
"Write a Python function to check if a number is prime."
Long context: Paste a paragraph and ask questions about it.

Training Details (How It Was Built)
This model uses continual learning with stacked LoRA adapters:

Base model frozen.
Each phase adds a new LoRA (r=64, ~119M trainable params per phase).
Trained sequentially on split UltraChat_200k (69k examples per phase).
Tool: Unsloth + TRL SFTTrainer (2x faster than standard).
Quick demo: 60 steps per phase (~30 min total on T4 GPU).
For stronger results: Increase max_steps=300-500 per phase.

Full training code (Colab-ready) available in the repo files or original notebook.
Limitations

Short training demo → Good but not SOTA (responses may repeat sometimes).
Text-only (no vision/multimodal).
English primary (UltraChat is mostly English).

How to Improve / Extend
Want to grow it more?

Add Phase 4: Fine-tune on coding dataset (e.g., add new LoRA for programming).
Retrain with higher max_steps or larger r=128 for more connections.
Merge LoRAs fully: model.merge_and_unload() for single-file upload.

License
Same as base Phi-3: Microsoft Research License (permissive for research/commercial).
Made with ❤️ by Mark — continual learning experiment!
If you use/fork this, star the repo! 🚀
text

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support