--- base_model: unsloth/Qwen3-14B-unsloth-bnb-4bit tags: - text-generation-inference - transformers - unsloth - qwen3 - trl license: apache-2.0 language: - en --- I edited my `README.md` locally, but `unsloth` hijacked it. That's not good. Fine-tuned LoRA adapter from `unsloth/Qwen3-14B-unsloth-bnb-4bit` using `unsloth`. Based on [this tutorial](https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tune). ## Data Training data is 237 scenarios for dependents eligible under [26 U.S.C. 152 (a)-(d)](https://www.law.cornell.edu/uscode/text/26/152), generated with `gemini-2.5-pro-preview-03-25`, but **not** checked for correctness. Training arguments on A100 (40GB): ```python TrainingArguments( per_device_train_batch_size=8, gradient_accumulation_steps=4, num_train_epochs=16, warmup_steps=16, learning_rate=2e-4, fp16=not is_bfloat16_supported(), bf16=is_bfloat16_supported(), logging_steps=10, optim="adamw_8bit", weight_decay=0.01, lr_scheduler_type="linear", seed=3407, # https://arxiv.org/abs/2109.08203 output_dir="outputs", report_to="none", ) ``` ## Usage To use: ```python from unsloth import FastLanguageModel max_seq_length = 2048 dtype = None load_in_4bit = True model, tokenizer = FastLanguageModel.from_pretrained( model_name="doabell/dependent-qlora", # YOUR MODEL YOU USED FOR TRAINING max_seq_length=max_seq_length, dtype=dtype, load_in_4bit=load_in_4bit, ) FastLanguageModel.for_inference(model) ``` Template: ```python alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: You are an experienced lawyer in dealing with dependents for US tax purposes. ### Input: {} ### Response: {}""" ``` Streaming: ```python from transformers import TextStreamer FastLanguageModel.for_inference(model) inputs = tokenizer( [ alpaca_prompt.format( "Can I claim my 7 year old son? He is an instagram influencer and earned $505 last year.", "", ) ], return_tensors="pt", ).to("cuda") text_streamer = TextStreamer(tokenizer) _ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=256) ``` No streaming: ```python FastLanguageModel.for_inference(model) inputs = tokenizer( [ alpaca_prompt.format( "Can I claim my 7 year old son? He is an instagram influencer and earned $5050 last year.", "", ) ], return_tensors="pt", ).to("cuda") outputs = model.generate(**inputs, max_new_tokens=256, use_cache=True) tokenizer.batch_decode(outputs) ```