gpt2-alpaca-4bit / README.md
estradax's picture
Update README.md
c313b57 verified
metadata
license: mit
library_name: peft
base_model: openai-community/gpt2
tags:
  - generated_from_trainer
  - gpt2
  - alpaca
  - qlora
  - 4-bit
datasets:
  - tatsu-lab/alpaca
language:
  - en

GPT2-Alpaca-4bit

This model is a fine-tuned version of openai-community/gpt2 on the tatsu-lab/alpaca dataset.

It was trained using QLoRA (4-bit quantization + LoRA) to follow instructions.

Model Description

  • Model Type: Causal Language Model
  • Base Model: GPT-2
  • Dataset: Alpaca (Instruction Tuning)
  • Language: English
  • Training Method: QLoRA (4-bit quantization via bitsandbytes + peft)

How to Use

To use this model, you need to load the base GPT-2 model in 4-bit precision and then attach the trained LoRA adapters.

Installation

pip install transformers torch peft bitsandbytes accelerate

Inference Code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel, PeftConfig

# 1. Load the Base Model (GPT-2) with 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model_id = "openai-community/gpt2"
model = AutoModelForCausalLM.from_pretrained(
    base_model_id, 
    quantization_config=bnb_config, 
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

# 2. Load the LoRA Adapters
peft_model_id = "estradax/gpt2-alpaca-4bit"
model = PeftModel.from_pretrained(model, peft_model_id)

# 3. Run Inference
text = "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat is the capital of France?\n\n### Response:\n"
inputs = tokenizer(text, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))