mlabonne/guanaco-llama2-1k
Viewer • Updated • 1k • 2.31k • 162
How to use harshagale/llm-upload with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("NousResearch/Llama-2-7b-chat-hf")
model = PeftModel.from_pretrained(base_model, "harshagale/llm-upload")This model is a fine-tuned version of NousResearch/Llama-2-7b-chat-hf using Parameter-Efficient Fine-Tuning (PEFT) via QLoRA (4-bit quantization). It was trained on the mlabonne/guanaco-llama2-1k dataset.
Note: This repository contains only the adapter weights. To use this model, you need to load the base model (
NousResearch/Llama-2-7b-chat-hf) and apply these LoRA adapters on top of it.
NousResearch/Llama-2-7b-chat-hfThe model was trained using the following configuration:
nf4) with double quantizationfloat16q_proj, v_projpaged_adamw_8bitThis project was created to learn and experiment with:
You can easily load this model and its adapters using the transformers and peft libraries:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
model_id = "NousResearch/Llama-2-7b-chat-hf"
adapter_id = "harshagale/llm-upload"
# 1. You must use the same 4-bit config to load the base model
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True
)
# 2. Load the base tokenizer and configure the padding token
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
# 3. Load the quantized base model
base_model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto"
)
# 4. Merge the PEFT adapter weights onto the base model
model = PeftModel.from_pretrained(base_model, adapter_id)
# 5. Quick inference test
prompt = "Human: Tell me a joke.\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Base model
NousResearch/Llama-2-7b-chat-hf