tatsu-lab/alpaca
Viewer • Updated • 52k • 102k • 985
How to use estradax/gpt2-medium-alpaca-4bit with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2-medium")
model = PeftModel.from_pretrained(base_model, "estradax/gpt2-medium-alpaca-4bit")This model is a fine-tuned version of openai-community/gpt2-medium on the tatsu-lab/alpaca dataset.
It was trained using QLoRA (4-bit quantization + LoRA) to follow instructions.
bitsandbytes + peft)To use this model, you need to load the base GPT-2 model in 4-bit precision and then attach the trained LoRA adapters.
pip install transformers torch peft bitsandbytes accelerate
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel, PeftConfig
# 1. Load the Base Model (GPT-2) with 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
base_model_id = "openai-community/gpt2-medium"
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
quantization_config=bnb_config,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
# 2. Load the LoRA Adapters
peft_model_id = "estradax/gpt2-medium-alpaca-4bit"
model = PeftModel.from_pretrained(model, peft_model_id)
# 3. Run Inference
text = "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat is the capital of France?\n\n### Response:\n"
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Base model
openai-community/gpt2-medium