|
|
--- |
|
|
base_model: unsloth/Qwen3-14B-unsloth-bnb-4bit |
|
|
tags: |
|
|
- text-generation-inference |
|
|
- transformers |
|
|
- unsloth |
|
|
- qwen3 |
|
|
- trl |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
I edited my `README.md` locally, but `unsloth` hijacked it. That's not good. |
|
|
|
|
|
Fine-tuned LoRA adapter from `unsloth/Qwen3-14B-unsloth-bnb-4bit` using `unsloth`. |
|
|
|
|
|
Based on [this tutorial](https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tune). |
|
|
|
|
|
## Data |
|
|
|
|
|
Training data is 237 scenarios for dependents eligible under [26 U.S.C. 152 (a)-(d)](https://www.law.cornell.edu/uscode/text/26/152), generated with `gemini-2.5-pro-preview-03-25`, but **not** checked for correctness. |
|
|
|
|
|
Training arguments on A100 (40GB): |
|
|
|
|
|
```python |
|
|
TrainingArguments( |
|
|
per_device_train_batch_size=8, |
|
|
gradient_accumulation_steps=4, |
|
|
num_train_epochs=16, |
|
|
warmup_steps=16, |
|
|
learning_rate=2e-4, |
|
|
fp16=not is_bfloat16_supported(), |
|
|
bf16=is_bfloat16_supported(), |
|
|
logging_steps=10, |
|
|
optim="adamw_8bit", |
|
|
weight_decay=0.01, |
|
|
lr_scheduler_type="linear", |
|
|
seed=3407, # https://arxiv.org/abs/2109.08203 |
|
|
output_dir="outputs", |
|
|
report_to="none", |
|
|
) |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
|
|
|
To use: |
|
|
```python |
|
|
from unsloth import FastLanguageModel |
|
|
|
|
|
max_seq_length = 2048 |
|
|
dtype = None |
|
|
load_in_4bit = True |
|
|
|
|
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
|
model_name="doabell/dependent-qlora", # YOUR MODEL YOU USED FOR TRAINING |
|
|
max_seq_length=max_seq_length, |
|
|
dtype=dtype, |
|
|
load_in_4bit=load_in_4bit, |
|
|
) |
|
|
FastLanguageModel.for_inference(model) |
|
|
``` |
|
|
|
|
|
Template: |
|
|
```python |
|
|
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. |
|
|
|
|
|
### Instruction: |
|
|
You are an experienced lawyer in dealing with dependents for US tax purposes. |
|
|
|
|
|
### Input: |
|
|
{} |
|
|
|
|
|
### Response: |
|
|
{}""" |
|
|
``` |
|
|
|
|
|
|
|
|
Streaming: |
|
|
```python |
|
|
from transformers import TextStreamer |
|
|
|
|
|
FastLanguageModel.for_inference(model) |
|
|
inputs = tokenizer( |
|
|
[ |
|
|
alpaca_prompt.format( |
|
|
"Can I claim my 7 year old son? He is an instagram influencer and earned $505 last year.", |
|
|
"", |
|
|
) |
|
|
], |
|
|
return_tensors="pt", |
|
|
).to("cuda") |
|
|
|
|
|
text_streamer = TextStreamer(tokenizer) |
|
|
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=256) |
|
|
``` |
|
|
|
|
|
No streaming: |
|
|
```python |
|
|
FastLanguageModel.for_inference(model) |
|
|
inputs = tokenizer( |
|
|
[ |
|
|
alpaca_prompt.format( |
|
|
"Can I claim my 7 year old son? He is an instagram influencer and earned $5050 last year.", |
|
|
"", |
|
|
) |
|
|
], |
|
|
return_tensors="pt", |
|
|
).to("cuda") |
|
|
|
|
|
outputs = model.generate(**inputs, max_new_tokens=256, use_cache=True) |
|
|
tokenizer.batch_decode(outputs) |
|
|
``` |
|
|
|