File size: 18,043 Bytes

# QLoRA Instruction Tuning on Pythia-1B

This repository provides a **Hugging Face–compatible LoRA adapter** trained via **QLoRA (4-bit quantization + LoRA adapters)** on the **EleutherAI Pythia-1B-deduped** base model.

The project focuses on **producing and publishing a reusable LoRA adapter** using a modern, memory-efficient instruction-tuning pipeline built with Hugging Face Transformers, PEFT, and BitsAndBytes. It is designed for **learning, experimentation, and small-GPU environments (e.g. Colab)**.

---

## ✨ Key Features (Adapter-Centric)

* 🔒 **Frozen base model**: Pythia-1B-deduped (not included in this repository)
* 🧠 **QLoRA training** with 4-bit NF4 quantization
* 🧩 **LoRA adapters only** are trainable (<1% parameters)
* 💾 Optimized for **low GPU memory usage**
* 📚 Clear, minimal pipeline for understanding instruction tuning

---

## 🧠 What This Adapter Represents

This adapter demonstrates how to:

* Load a **4-bit quantized causal language model**
* Prepare it for k-bit training
* Apply **LoRA adapters** for parameter-efficient fine-tuning
* Perform **instruction tuning** using causal LM loss
* Train using the Hugging Face `Trainer` API

Formally, training follows:

```
Frozen Base Model (4-bit)
+ Trainable LoRA ΔW
→ Instruction-following behavior
```

---

## 🏗️ Model & Training Setup

### Base Model

* **Model**: `EleutherAI/pythia-1B-deduped`
* **Architecture**: Decoder-only Transformer
* **Quantization**: 4-bit NF4 (BitsAndBytes)

### LoRA Configuration

| Parameter      | Value       | Description                      |
| -------------- | ----------- | -------------------------------- |
| `r`            | 32          | LoRA rank (expressiveness)       |
| `lora_alpha`   | 32          | Scaling factor                   |
| `lora_dropout` | 0.05        | Regularization                   |
| `bias`         | `none`      | Only LoRA parameters are trained |
| `task_type`    | `CAUSAL_LM` | Causal language modeling         |

Only **LoRA parameters** are trainable; all base model weights remain frozen.

---

## 📦 Dataset

* **Type**: Instruction-formatted text dataset
* **Format**: Each example contains a `text` field
* **Tokenization**:

  * Max length: 512
  * Padding: `max_length`
  * Truncation enabled

Loss is computed using **standard causal language modeling**, meaning the model learns to predict the full sequence (instruction + response).

---

## 🚀 Adapter Training & Usage Pipeline

### 1. Load tokenizer and model

* Load Pythia tokenizer
* Set `pad_token = eos_token`
* Load model with 4-bit quantization

### 2. Prepare for QLoRA training

* Enable gradient checkpointing
* Cast critical layers for numerical stability
* Freeze base model parameters

### 3. Apply LoRA adapters

* Inject LoRA modules into attention and MLP layers
* Print trainable parameter count

### 4. Training configuration

| Setting               | Value              |
| --------------------- | ------------------ |
| Epochs                | 3                  |
| Batch size            | 6                  |
| Gradient accumulation | 4                  |
| Effective batch size  | 24                 |
| Learning rate         | 2e-4               |
| Optimizer             | `paged_adamw_8bit` |
| Precision             | FP16               |

### 5. Start 

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel, LoraConfig

base_model_name = "EleutherAI/pythia-1B-deduped"
lora_repo = "BEncoderRT/Pythia-QLoRA-Instruction-Tuning"

tokenizer = AutoTokenizer.from_pretrained(base_model_name)
tokenizer.pad_token = tokenizer.eos_token



# Load the base model with the new quantization configuration
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    device_map="auto",
    dtype=torch.bfloat16 # Corrected: Use dtype instead of torch_dtype
)

# Load the PEFT model (LoRA adapters)
model = PeftModel.from_pretrained(base_model, lora_repo)

```

```python
import torch

# Ensure the model is in evaluation mode
model.eval()

# Function to format prompts consistently with training data
def format_prompt(instruction, context=None):
    if context:
        return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n### Response:\n"
    else:
        return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n"

# Define a few test prompts
test_prompts = [
    {
        "instruction": "Explain the concept of photosynthesis in simple terms.",
        "context": None
    },
    {
        "instruction": "What is the capital of France?",
        "context": None
    },
    {
        "instruction": "Summarize the main idea of the following text:",
        "context": "The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram."
    },
    {
        "instruction": "List three benefits of regular exercise.",
        "context": None
    }
]

# Add the new test prompts (assuming `new_test_prompts` is defined as in the previous step)
new_test_prompts = [
    {
        "instruction": "Write a short, imaginative story about a cat who discovers a secret portal to another dimension under its owner's bed.",
        "context": None
    },
    {
        "instruction": "If a train leaves New York at 10 AM traveling at 60 mph and another train leaves Chicago at 11 AM traveling at 50 mph, and the cities are 800 miles apart, at what time do they meet? (Assume they are traveling towards each other on the same track).",
        "context": None
    },
    {
        "instruction": "What is the capital of Australia?",
        "context": None
    },
    {
        "instruction": "Explain the difference between supervised and unsupervised learning in machine learning, and provide an example of when each would be used.",
        "context": None
    },
    {
        "instruction": "Summarize the following passage:",
        "context": "The advent of artificial intelligence has brought forth a new era of technological advancement, impacting various sectors from healthcare to finance. While AI promises increased efficiency and innovative solutions, it also raises ethical concerns regarding job displacement, privacy, and bias in algorithms. Societies worldwide are grappling with how to regulate and integrate AI responsibly, balancing progress with human values. This calls for a multidisciplinary approach involving policymakers, technologists, ethicists, and the public to shape a future where AI serves humanity's best interests."
    }
]
test_prompts.extend(new_test_prompts)

# Generate responses for each test prompt
print("\n--- Generating Responses ---\n")
with torch.no_grad():
    for i, prompt_data in enumerate(test_prompts):
        instruction = prompt_data["instruction"]
        context = prompt_data["context"]

        formatted_input = format_prompt(instruction, context)

        # Tokenize the input prompt
        inputs = tokenizer(formatted_input, return_tensors="pt").to(model.device)

        # Generate response, explicitly using both eos_token_id and pad_token_id
        outputs = model.generate(
            **inputs,
            max_new_tokens=150,  # Limit the length of the generated response
            do_sample=True,      # Enable sampling for more diverse responses
            temperature=0.7,     # Control randomness (lower means less random)
            top_k=50,            # Consider only top 50 probable tokens
            top_p=0.95,          # Nucleus sampling
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id # Explicitly set pad_token_id
        )

        # Decode the generated tokens
        # Exclude the input prompt from the decoded text to get only the model's response
        generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
        response = generated_text[len(formatted_input):].strip()

        print(f"### Test Prompt {i+1}: ###")
        print(f"Instruction: {instruction}")
        if context:
            print(f"Context: {context}")
        print(f"Model Response: {response}\n")

```
---
```

--- Generating Responses ---

### Test Prompt 1: ###
Instruction: Explain the concept of photosynthesis in simple terms.
Model Response: Photosynthesis is a process in which green plants absorb sunlight and convert it to energy to be used by the living plant. The process of photosynthesis is the process of converting the energy from the sun's rays into an organic compound called a molecule. The process of photosynthesis occurs when the energy from the sun is transformed into chemical energy in the form of energy-rich molecules called pigments. The pigments of the plant are the photosynthetic products and they are absorbed by the plant. The process of photosynthesis is the process of converting the energy from the sun into organic compounds called pigments and absorbing them. In plants, the process of photosynthesis is the process of converting the energy from the sun into organic compounds and converting them into pigments.

### Test Prompt 2: ###
Instruction: What is the capital of France?
Model Response: Paris is the capital of France. The city is located on the Mediterranean coast of France and is considered to be the most densely populated city in the world. It has an estimated population of about 8.3 million people. It is the fourth most populous city in Europe and the second most populous city in North America. The city's economy is based on tourism, with most of the tourists coming from around the world. It is also a major international hub for finance and technology. The city has hosted many notable people, including the Pope, who is the longest serving Catholic leader in the world.

The city is the seat of the region of Paris, which has its own government and a parliament called the National Assembly. It is the largest

### Test Prompt 3: ###
Instruction: Summarize the main idea of the following text:
Context: The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.
Model Response: The quick brown fox jumps over the lazy dog. This sentence is often used to display all letters of the English alphabet, making it a pangram.

In a pangram, the letters of the alphabet are grouped in a set of groups called "pangrams". The word "pangram" means "a pangram" or "a pangram-like structure".

In the sentence "The quick brown fox jumps over the lazy dog" there are four groups of characters:

The quick brown fox jumps over the lazy dog.

The lazy dog is lazy.

The quick brown fox jumps over the lazy dog.

The lazy dog is not lazy.

The quick brown

### Test Prompt 4: ###
Instruction: List three benefits of regular exercise.
Model Response: 1. It helps you stay fit.
2. It helps you relax.
3. It helps you lose weight.
4. It helps you lose your bad habits.
5. It helps you stay healthy.
6. It helps you improve your mood.
7. It helps you sleep better.
8. It helps you lose your extra weight.
9. It helps you lose your bad habits.
10. It helps you live longer.
11. It helps you live a healthier lifestyle.
12. It helps you live a healthier lifestyle.
13. It helps you live a healthier lifestyle.
14. It helps you live a healthier lifestyle.
15. It helps you live a healthier lifestyle.

### Test Prompt 5: ###
Instruction: Write a short, imaginative story about a cat who discovers a secret portal to another dimension under its owner's bed.
Model Response: Cat, a cat, was sleeping in her bed when she heard a loud noise. She woke up to find her owner, Mr. J, who was out of bed. He said that he heard a loud noise and wondered where the noise was coming from. He then saw the cat's paw print and realized that he was sleeping under his bed. The cat asked him why he was sleeping there and he explained that he had a secret portal to another dimension where he could escape. The cat then told him that he would have to share this secret with him. Mr. J agreed to share this secret with the cat. The cat explained to Mr. J that he would have to find a portal to get to this dimension. Mr. J looked for

### Test Prompt 6: ###
Instruction: If a train leaves New York at 10 AM traveling at 60 mph and another train leaves Chicago at 11 AM traveling at 50 mph, and the cities are 800 miles apart, at what time do they meet? (Assume they are traveling towards each other on the same track).
Model Response: The time of arrival is the same. The time of departure is different because the train leaving New York is traveling faster than the train leaving Chicago. If the train leaving New York is at 10 AM and the train leaving Chicago is at 11 AM, then the time of arrival is at 10:00 AM and the time of departure is at 11:00 AM. If the train leaving New York is at 10 AM and the train leaving Chicago is at 11 AM, then the time of arrival is at 10:00 AM and the time of departure is at 11:00 AM. So, the time of arrival is at 10:00 AM and the time of departure is at 11:00 AM.

If the trains are traveling toward each other on

### Test Prompt 7: ###
Instruction: What is the capital of Australia?
Model Response: The capital of Australia is Sydney.

Sydney is a city in Australia. It is a major city in Australia. It is the most populous city in Australia.

Sydney is a major financial centre. It is the world's most populous city by a wide margin. It is the world's most populous city by a wide margin.

Sydney is home to the Sydney Opera House and the National Library of Australia.

Sydney is home to the Sydney Harbour Bridge and the Sydney Opera House.

Sydney is home to the Sydney Cricket Ground and the Sydney Football Stadium.

Sydney is home to the Sydney Tower and the Sydney Aquarium.

### Test Prompt 8: ###
Instruction: Explain the difference between supervised and unsupervised learning in machine learning, and provide an example of when each would be used.
Model Response: In machine learning, supervised learning requires a dataset that has been labelled by a human. This is often done by manually creating a labelled dataset that has to be annotated with a set of labels that a machine can understand. For example, a machine might be given a dataset of people's names and asked to categorize them. It might be possible for a machine to learn a person's name and then use it to classify the rest of the person's name. It's very easy to train a machine that can do this. However, there are a number of limitations with this approach. For example, a machine that can categorize a person's name may be unable to understand the person's ethnicity, nationality, or any other characteristics that a person might have

### Test Prompt 9: ###
Instruction: Summarize the following passage:
Context: The advent of artificial intelligence has brought forth a new era of technological advancement, impacting various sectors from healthcare to finance. While AI promises increased efficiency and innovative solutions, it also raises ethical concerns regarding job displacement, privacy, and bias in algorithms. Societies worldwide are grappling with how to regulate and integrate AI responsibly, balancing progress with human values. This calls for a multidisciplinary approach involving policymakers, technologists, ethicists, and the public to shape a future where AI serves humanity's best interests.
Model Response: Artificial intelligence promises increased efficiency and innovative solutions, but also raises ethical concerns regarding job displacement, privacy, and bias in algorithms. Societies worldwide are grappling with how to regulate and integrate AI responsibly, balancing progress with human values. This calls for a multidisciplinary approach involving policymakers, technologists, ethicists, and the public to shape a future where AI serves humanity's best interests.

Artificial intelligence promises increased efficiency and innovative solutions, but also raises ethical concerns regarding job displacement, privacy, and bias in algorithms. Societies worldwide are grappling with how to regulate and integrate AI responsibly, balancing progress with human values. This calls for a multidisciplinary approach involving policymakers, technologists, ethicists, and the public to shape

```
---

## 📊 Why QLoRA?

Compared to full fine-tuning:

* ✅ ~10× lower GPU memory usage
* ✅ Faster experimentation
* ✅ No catastrophic forgetting
* ✅ Easy adapter reuse and sharing

This approach mirrors how many modern instruction-tuned LLMs are trained at scale.

---

## 📈 Expected Behavior When Using This Adapter

After training, the model should:

* Follow instructions more directly
* Produce more structured and task-aligned responses
* Show clear behavioral differences **with vs without** LoRA adapters

Adapter ablation (disabling LoRA) should revert behavior close to the base model.

---

## 🔮 Possible Extensions

* Mask loss to train **response-only instruction tuning**
* Train multiple LoRA adapters for different tasks
* Merge or switch adapters at inference time
* Combine with evaluation datasets
* Compare different LoRA ranks (`r=8`, `r=16`, `r=32`)

---

## 🛠️ Requirements

* Python 3.9+
* PyTorch
* transformers
* peft
* bitsandbytes
* accelerate

---

## 📜 License & Usage Notes

This repository publishes **only LoRA adapter weights** and configuration files. The base model must be obtained separately under its original license.

This adapter is intended for **research, experimentation, and non-production use** unless further evaluated.

---

This repository provides a **clean, minimal reference implementation** of QLoRA-based instruction tuning on a 1B-scale language model.