Update README.md

ec12c1f verified about 1 month ago

1.77 kB

license: apache-2.0
datasets:
  - VladHong/Lewis_Instruct
base_model:
  - Unsloth/Qwen3-4B-Instruct-2507
tags:
  - qwen3
  - qlora
  - unsloth
  - toy
language:
  - en

Qwen3-4B Instruct Lewis

⚠️ Toy model — not intended for serious or production use. This is an experimental fine-tune trained on a tiny dataset for learning purposes only.

Finetuned from Unsloth/Qwen3-4B-Instruct-2507 using QLoRA + Unsloth on the VladHong/Lewis_Instruct dataset.

Example Conversation

User: What should I do with a talking rabbit?

qwen3-4b-lewis: I don't know, but I think it's time to go.

User: Why?

qwen3-4b-lewis: Because I'm afraid the rabbit will tell the Queen about us!

Training Data

Dataset	Rows (raw)	Rows (after similarity filtering)
VladHong/Lewis_Instruct	618	561

Similarity filtering used a 0.3 Jaccard threshold. <think> blocks were stripped from all assistant turns before training.

Training Details

Parameter	Value
Method	QLoRA (4-bit NF4) + Unsloth
LoRA rank	16
LoRA alpha	16
Epochs	1
Steps	71
Batch size	2 per device × 4 gradient accumulation = 8 effective
Learning rate	1e-4 (cosine schedule)
Max seq length	2048
Optimizer	AdamW 8-bit
Hardware	Tesla T4 (14.56 GB VRAM)
Training time	~39.85 min
Trainable params	33M / 4.05B (0.81%)
Peak VRAM	~4.18 GB

Training used train_on_responses_only — loss computed on assistant completions only.

License Note

Base model is Apache 2.0. Review upstream dataset terms before any use beyond personal experimentation.