GPT2-Medium-Alpaca-4bit

This model is a fine-tuned version of openai-community/gpt2-medium on the tatsu-lab/alpaca dataset.

It was trained using QLoRA (4-bit quantization + LoRA) to follow instructions.

Model Description

Model Type: Causal Language Model
Base Model: GPT-2 Medium
Dataset: Alpaca (Instruction Tuning)
Language: English
Training Method: QLoRA (4-bit quantization via bitsandbytes + peft)

How to Use

To use this model, you need to load the base GPT-2 model in 4-bit precision and then attach the trained LoRA adapters.

Installation

pip install transformers torch peft bitsandbytes accelerate

Inference Code

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel, PeftConfig

# 1. Load the Base Model (GPT-2) with 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model_id = "openai-community/gpt2-medium"
model = AutoModelForCausalLM.from_pretrained(
    base_model_id, 
    quantization_config=bnb_config, 
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

# 2. Load the LoRA Adapters
peft_model_id = "estradax/gpt2-medium-alpaca-4bit"
model = PeftModel.from_pretrained(model, peft_model_id)

# 3. Run Inference
text = "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat is the capital of France?\n\n### Response:\n"
inputs = tokenizer(text, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for estradax/gpt2-medium-alpaca-4bit

Base model

openai-community/gpt2-medium

Adapter

(285)

this model

estradax
/

gpt2-medium-alpaca-4bit