dependent-qlora / README.md
doabell's picture
Update README.md
f1c513b verified
|
raw
history blame
2.7 kB
metadata
base_model: unsloth/Qwen3-14B-unsloth-bnb-4bit
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - qwen3
  - trl
license: apache-2.0
language:
  - en

I edited my README.md locally, but unsloth hijacked it. That's not good.

Fine-tuned LoRA adapter from unsloth/Qwen3-14B-unsloth-bnb-4bit using unsloth.

Based on this tutorial.

Data

Training data is 237 scenarios for dependents eligible under 26 U.S.C. 152 (a)-(d), generated with gemini-2.5-pro-preview-03-25, but not checked for correctness.

Training arguments on A100 (40GB):

TrainingArguments(
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    num_train_epochs=16,
    warmup_steps=16,
    learning_rate=2e-4,
    fp16=not is_bfloat16_supported(),
    bf16=is_bfloat16_supported(),
    logging_steps=10,
    optim="adamw_8bit",
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=3407,  # https://arxiv.org/abs/2109.08203
    output_dir="outputs",
    report_to="none",
)

Usage

To use:

from unsloth import FastLanguageModel

max_seq_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="doabell/dependent-qlora",  # YOUR MODEL YOU USED FOR TRAINING
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)
FastLanguageModel.for_inference(model)

Template:

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
You are an experienced lawyer in dealing with dependents for US tax purposes.

### Input:
{}

### Response:
{}"""

Streaming:

from transformers import TextStreamer

FastLanguageModel.for_inference(model)
inputs = tokenizer(
    [
        alpaca_prompt.format(
            "Can I claim my 7 year old son? He is an instagram influencer and earned $505 last year.",
            "",
        )
    ],
    return_tensors="pt",
).to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=256)

No streaming:

FastLanguageModel.for_inference(model)
inputs = tokenizer(
    [
        alpaca_prompt.format(
            "Can I claim my 7 year old son? He is an instagram influencer and earned $5050 last year.",
            "",
        )
    ],
    return_tensors="pt",
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=256, use_cache=True)
tokenizer.batch_decode(outputs)