QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 61
This repo contains 4-bit quantized (using bitsandbytes) model of Technology Innovation Institute's tiiuae/falcon-7b-instruct
QLoRA: Efficient Finetuning of Quantized LLMs: arXiv - QLoRA: Efficient Finetuning of Quantized LLMs
Hugging Face Blog post on 4-bit quantization using bitsandbytes: Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA
bitsandbytes github repo: bitsandbytes github repo
Use the code below to get started with the model.
pip install -q -U bitsandbytes accelerate torch huggingface_hub
pip install -q -U git+https://github.com/huggingface/transformers.git # Install latest version of transformers
pip install -q -U git+https://github.com/huggingface/peft.git
pip install flash-attn --no-build-isolation
import torch
import os
from torch import bfloat16
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig, LlamaForCausalLM
model_id_falcon = "alokabhishek/falcon-7b-instruct-bnb-4bit"
tokenizer_falcon = AutoTokenizer.from_pretrained(model_id_falcon, use_fast=True)
model_falcon = AutoModelForCausalLM.from_pretrained(
model_id_falcon,
device_map="auto"
)
pipe_falcon = pipeline(model=model_falcon, tokenizer=tokenizer_falcon, task='text-generation')
prompt_falcon = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."
output_falcon = pipe_falcon(prompt_falcon, max_new_tokens=512)
print(output_falcon[0]["generated_text"])
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]