How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="VladHong/Qwen3-4B-Instruct-Lewis",
	filename="Qwen3-4B-Instruct-Lewis-Q5_K_M.gguf",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Qwen3-4B Instruct Lewis

⚠️ Toy model — not intended for serious or production use. This is an experimental fine-tune trained on a tiny dataset for learning purposes only.

Finetuned from Unsloth/Qwen3-4B-Instruct-2507 using QLoRA + Unsloth on the VladHong/Lewis_Instruct dataset.

Example Conversation

User: What should I do with a talking rabbit?

qwen3-4b-lewis: I don't know, but I think it's time to go.

User: Why?

qwen3-4b-lewis: Because I'm afraid the rabbit will tell the Queen about us!

Training Data

Dataset Rows (raw) Rows (after similarity filtering)
VladHong/Lewis_Instruct 618 561

Similarity filtering used a 0.3 Jaccard threshold. <think> blocks were stripped from all assistant turns before training.

Training Details

Parameter Value
Method QLoRA (4-bit NF4) + Unsloth
LoRA rank 16
LoRA alpha 16
Epochs 1
Steps 71
Batch size 2 per device × 4 gradient accumulation = 8 effective
Learning rate 1e-4 (cosine schedule)
Max seq length 2048
Optimizer AdamW 8-bit
Hardware Tesla T4 (14.56 GB VRAM)
Training time ~39.85 min
Trainable params 33M / 4.05B (0.81%)
Peak VRAM ~4.18 GB

Training used train_on_responses_only — loss computed on assistant completions only.

License Note

Base model is Apache 2.0. Review upstream dataset terms before any use beyond personal experimentation.

Downloads last month
-
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train VladHong/Qwen3-4B-Instruct-Lewis