πŸ¦™ Fine-Tuning LLaMA 2 7B on Custom Dataset using Unsloth + LoRA + Quantization

This project demonstrates how to fine-tune the LLaMA 2 7B model on a custom raw text dataset using Unsloth, LoRA, and 4-bit Quantization (QLoRA) techniques. We use a modular approach that can be adapted to any other dataset.


image/gif

Dataset

We use two datasets in this tutorial:

  1. FineTome-100k β€” for GPT-like behavior.
  2. Hawaiian Wildfire β€” a plain text dataset, demonstrating how to use your own data.

Libraries Used

Install the required libraries:

pip install peft accelerate bitsandbytes transformers datasets GPUtil

Key libraries:

  • transformers (HuggingFace)
  • datasets (HuggingFace)
  • bitsandbytes (for quantization)
  • peft (Parameter-Efficient Fine-Tuning)
  • GPUtil (to monitor GPU usage)

Finetuning Setup

βœ… GPU Check

We check GPU availability and set CUDA configurations:

import torch, GPUtil, os
GPUtil.showUtilization()

if torch.cuda.is_available():
    print("βœ… GPU Available")
else:
    print("❌ Using CPU")

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

Load Base Model with Quantization

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

model = AutoModelForCausalLM.from_pretrained(
    "unsloth/llama-2-7b",
    quantization_config=bnb_config,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-2-7b")

Apply LoRA using PEFT

from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model

model = prepare_model_for_kbit_training(model)

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_config)

Load Your Custom Dataset

Here we show how to use a simple text file (hawaiian_wildfire.txt) as your dataset.

from datasets import Dataset

with open("hawaiian_wildfire.txt", "r") as f:
    data = f.read()

dataset = Dataset.from_dict({"text": [data]})

Tokenization and Formatting

def tokenize(batch):
    return tokenizer(batch["text"], truncation=True, padding="max_length", max_length=512)

tokenized_dataset = dataset.map(tokenize)

Training

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="llama-custom-lora",
    per_device_train_batch_size=1,
    num_train_epochs=2,
    logging_steps=10,
    save_steps=100,
    save_total_limit=2,
    fp16=True,
    optim="paged_adamw_8bit",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    tokenizer=tokenizer,
)
trainer.train()

Inference

model.eval()
input_text = "The wildfire in Hawaii caused"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

Theory: LoRA, QLoRA & PEFT

  • LoRA: Freeze original weights and train low-rank matrices.
  • QLoRA: Quantize model (4-bit) + apply LoRA.
  • PEFT: Framework for efficient fine-tuning (e.g., LoRA, Prompt Tuning).
  • Quantization:
    • Reduces precision (float32 β†’ int8/float4)
    • Saves memory and speeds up training.

Tools Like Unsloth & LLaMA-Factory

Instead of manual setup, you can use:


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support