Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,64 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- VladHong/Lewis_Instruct
|
| 5 |
+
base_model:
|
| 6 |
+
- Unsloth/Qwen3-4B-Instruct-2507
|
| 7 |
+
tags:
|
| 8 |
+
- qwen3
|
| 9 |
+
- qlora
|
| 10 |
+
- unsloth
|
| 11 |
+
- toy
|
| 12 |
+
language:
|
| 13 |
+
- en
|
| 14 |
+
---
|
| 15 |
+
# Qwen3-4B Instruct Lewis
|
| 16 |
+
|
| 17 |
+
> ⚠️ **Toy model — not intended for serious or production use.** This is an experimental
|
| 18 |
+
> fine-tune trained on a tiny dataset for learning purposes only.
|
| 19 |
+
|
| 20 |
+
Finetuned from [Unsloth/Qwen3-4B-Instruct-2507](https://huggingface.co/Unsloth/Qwen3-4B-Instruct-2507)
|
| 21 |
+
using QLoRA + Unsloth on the [VladHong/Lewis_Instruct](https://huggingface.co/datasets/VladHong/Lewis_Instruct) dataset.
|
| 22 |
+
|
| 23 |
+
## Example Conversation
|
| 24 |
+
|
| 25 |
+
> **User:** What should I do with a talking rabbit?
|
| 26 |
+
>
|
| 27 |
+
> **qwen3-4b-lewis:** I don't know, but I think it's time to go.
|
| 28 |
+
>
|
| 29 |
+
> **User:** Why?
|
| 30 |
+
>
|
| 31 |
+
> **qwen3-4b-lewis:** Because I'm afraid the rabbit will tell the Queen about us!
|
| 32 |
+
|
| 33 |
+
## Training Data
|
| 34 |
+
|
| 35 |
+
| Dataset | Rows (raw) | Rows (after similarity filtering) |
|
| 36 |
+
|---|---|---|
|
| 37 |
+
| VladHong/Lewis_Instruct | 618 | 561 |
|
| 38 |
+
|
| 39 |
+
Similarity filtering used a 0.3 Jaccard threshold. `<think>` blocks were stripped from
|
| 40 |
+
all assistant turns before training.
|
| 41 |
+
|
| 42 |
+
## Training Details
|
| 43 |
+
|
| 44 |
+
| Parameter | Value |
|
| 45 |
+
|---|---|
|
| 46 |
+
| Method | QLoRA (4-bit NF4) + Unsloth |
|
| 47 |
+
| LoRA rank | 16 |
|
| 48 |
+
| LoRA alpha | 16 |
|
| 49 |
+
| Epochs | 1 |
|
| 50 |
+
| Steps | 71 |
|
| 51 |
+
| Batch size | 2 per device × 4 gradient accumulation = 8 effective |
|
| 52 |
+
| Learning rate | 1e-4 (cosine schedule) |
|
| 53 |
+
| Max seq length | 2048 |
|
| 54 |
+
| Optimizer | AdamW 8-bit |
|
| 55 |
+
| Hardware | Tesla T4 (14.56 GB VRAM) |
|
| 56 |
+
| Training time | ~39.85 min |
|
| 57 |
+
| Trainable params | 33M / 4.05B (0.81%) |
|
| 58 |
+
| Peak VRAM | ~4.18 GB |
|
| 59 |
+
|
| 60 |
+
Training used `train_on_responses_only` — loss computed on assistant completions only.
|
| 61 |
+
|
| 62 |
+
## License Note
|
| 63 |
+
|
| 64 |
+
Base model is Apache 2.0. Review upstream dataset terms before any use beyond personal experimentation.
|