Update README.md

b0abe5f verified 29 days ago

4.12 kB

	# Phi-3 Grown Chat Model (Continual LoRA Adaptation)

	![Phi-3 Mini](https://huggingface.co/unsloth/Phi-3-mini-4k-instruct/resolve/main/thumbnail.png)

	A custom continual-learning chat model based on Phi-3-mini-4k-instruct
	Trained with sequential LoRA adapters to simulate "growing new neuron connections" for each learning phase — no catastrophic forgetting!

	- Base Model: [unsloth/Phi-3-mini-4k-instruct](https://huggingface.co/unsloth/Phi-3-mini-4k-instruct) (3.82B parameters)
	- Total Effective Size: ~4.1B parameters (base + ~360M from 3 stacked LoRA adapters)
	- Dataset: [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) – one of the best high-quality multi-turn conversation datasets
	- Training Method: Continual learning via sequential LoRA (adds new trainable connections per phase while freezing previous knowledge)
	- Phases:
	1. General Chat
	2. Reasoning & Q&A
	3. Roleplay & Long Context

	This model excels at natural conversation, reasoning, creative roleplay, and following instructions. It's efficient (4-bit quantized) and runs fast even on consumer GPUs.

	## Quick Start / Inference

	### Installation (One-Time Setup)

	```bash
	# Install Unsloth (fastest for Phi-3 + LoRA inference)
	pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
	pip install --no-deps xformers trl peft accelerate bitsandbytes









	Run Inference (Chat with the Model)

	from unsloth import FastLanguageModel
	import torch

	# Load the model (4-bit for efficiency)
	model, tokenizer = FastLanguageModel.from_pretrained(
	"yourusername/phi3-grown-chat", # Replace with your HF repo (or local path: "./phi3-grown-chat-model")
	dtype = None, # Auto-detect (float16/bf16)
	load_in_4bit = True, # Saves VRAM
	)

	# Enable fast inference
	FastLanguageModel.for_inference(model)

	# Chat loop example
	while True:
	user_input = input("You: ")
	if user_input.lower() in ["exit", "quit"]:
	break

	messages = [{"role": "user", "content": user_input}]
	inputs = tokenizer.apply_chat_template(
	messages,
	tokenize=True,
	add_generation_prompt=True,
	return_tensors="pt"
	).to("cuda")

	outputs = model.generate(
	input_ids=inputs,
	max_new_tokens=512,
	temperature=0.8,
	do_sample=True,
	top_p=0.95,
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	# Extract only assistant response
	print("Assistant:", response.split("<\|assistant\|>")[1].strip() if "<\|assistant\|>" in response else response)





	Example Prompts to Test

	"Hello! Tell me a fun fact about space."
	"Explain quantum computing like I'm 10 years old."
	"You are a pirate captain. Tell me about your greatest adventure."
	"Write a Python function to check if a number is prime."
	Long context: Paste a paragraph and ask questions about it.

	Training Details (How It Was Built)
	This model uses continual learning with stacked LoRA adapters:

	Base model frozen.
	Each phase adds a new LoRA (r=64, ~119M trainable params per phase).
	Trained sequentially on split UltraChat_200k (69k examples per phase).
	Tool: Unsloth + TRL SFTTrainer (2x faster than standard).
	Quick demo: 60 steps per phase (~30 min total on T4 GPU).
	For stronger results: Increase max_steps=300-500 per phase.

	Full training code (Colab-ready) available in the repo files or original notebook.
	Limitations

	Short training demo → Good but not SOTA (responses may repeat sometimes).
	Text-only (no vision/multimodal).
	English primary (UltraChat is mostly English).

	How to Improve / Extend
	Want to grow it more?

	Add Phase 4: Fine-tune on coding dataset (e.g., add new LoRA for programming).
	Retrain with higher max_steps or larger r=128 for more connections.
	Merge LoRAs fully: model.merge_and_unload() for single-file upload.

	License
	Same as base Phi-3: Microsoft Research License (permissive for research/commercial).
	Made with ❤️ by Mark — continual learning experiment!
	If you use/fork this, star the repo! 🚀
	text