QLoRA: Efficient Finetuning of Quantized LLMs
Paper
•
2305.14314
•
Published
•
58
from datasets import load_dataset
from unsloth import FastVisionModel
model, tokenizer = FastVisionModel.from_pretrained(
model_name = "MMoshtaghi/Qwen2-VL-7B-Instruct-LoRAAdpt-MathOCR",
load_in_4bit = True,
)
FastVisionModel.for_inference(model) # Enable for inference!
dataset = load_dataset("unsloth/LaTeX_OCR", split = "train")
image = dataset[0]["image"]
instruction = "Write the LaTeX representation for this image."
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": instruction}
]}
]
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt = True)
inputs = tokenizer(
image,
input_text,
add_special_tokens = False,
return_tensors = "pt",
).to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,
use_cache = True, temperature = 1.5, min_p = 0.1)
This VLM model was trained 2x faster with Unsloth and Huggingface's TRL library.