VladHong's picture
Update README.md
ec12c1f verified
metadata
license: apache-2.0
datasets:
  - VladHong/Lewis_Instruct
base_model:
  - Unsloth/Qwen3-4B-Instruct-2507
tags:
  - qwen3
  - qlora
  - unsloth
  - toy
language:
  - en

Qwen3-4B Instruct Lewis

⚠️ Toy model — not intended for serious or production use. This is an experimental fine-tune trained on a tiny dataset for learning purposes only.

Finetuned from Unsloth/Qwen3-4B-Instruct-2507 using QLoRA + Unsloth on the VladHong/Lewis_Instruct dataset.

Example Conversation

User: What should I do with a talking rabbit?

qwen3-4b-lewis: I don't know, but I think it's time to go.

User: Why?

qwen3-4b-lewis: Because I'm afraid the rabbit will tell the Queen about us!

Training Data

Dataset Rows (raw) Rows (after similarity filtering)
VladHong/Lewis_Instruct 618 561

Similarity filtering used a 0.3 Jaccard threshold. <think> blocks were stripped from all assistant turns before training.

Training Details

Parameter Value
Method QLoRA (4-bit NF4) + Unsloth
LoRA rank 16
LoRA alpha 16
Epochs 1
Steps 71
Batch size 2 per device × 4 gradient accumulation = 8 effective
Learning rate 1e-4 (cosine schedule)
Max seq length 2048
Optimizer AdamW 8-bit
Hardware Tesla T4 (14.56 GB VRAM)
Training time ~39.85 min
Trainable params 33M / 4.05B (0.81%)
Peak VRAM ~4.18 GB

Training used train_on_responses_only — loss computed on assistant completions only.

License Note

Base model is Apache 2.0. Review upstream dataset terms before any use beyond personal experimentation.