BEncoderRT
/

Pythia-QLoRA-Instruction-Tuning

@@ -1,14 +1,168 @@
 ---
-license: mit
-datasets:
-- databricks/databricks-dolly-15k
-language:
-- en
-base_model:
-- EleutherAI/pythia-1b-deduped
-pipeline_tag: text-generation
-tags:
-- peft
-- Lora
-- Instruction-Tuning
----

+# QLoRA Instruction Tuning on Pythia-1B
+This repository provides a **Hugging Face–compatible LoRA adapter** trained via **QLoRA (4-bit quantization + LoRA adapters)** on the **EleutherAI Pythia-1B-deduped** base model.
+The project focuses on **producing and publishing a reusable LoRA adapter** using a modern, memory-efficient instruction-tuning pipeline built with Hugging Face Transformers, PEFT, and BitsAndBytes. It is designed for **learning, experimentation, and small-GPU environments (e.g. Colab)**.
 ---
+## ✨ Key Features (Adapter-Centric)
+* 🔒 **Frozen base model**: Pythia-1B-deduped (not included in this repository)
+* 🧠 **QLoRA training** with 4-bit NF4 quantization
+* 🧩 **LoRA adapters only** are trainable (<1% parameters)
+* 💾 Optimized for **low GPU memory usage**
+* 📚 Clear, minimal pipeline for understanding instruction tuning
+---
+## 🧠 What This Adapter Represents
+This adapter demonstrates how to:
+* Load a **4-bit quantized causal language model**
+* Prepare it for k-bit training
+* Apply **LoRA adapters** for parameter-efficient fine-tuning
+* Perform **instruction tuning** using causal LM loss
+* Train using the Hugging Face `Trainer` API
+Formally, training follows:
+```
+Frozen Base Model (4-bit)
++ Trainable LoRA ΔW
+→ Instruction-following behavior
+```
+---
+## 🏗️ Model & Training Setup
+### Base Model
+* **Model**: `EleutherAI/pythia-1B-deduped`
+* **Architecture**: Decoder-only Transformer
+* **Quantization**: 4-bit NF4 (BitsAndBytes)
+### LoRA Configuration
+| Parameter      | Value       | Description                      |
+| -------------- | ----------- | -------------------------------- |
+| `r`            | 32          | LoRA rank (expressiveness)       |
+| `lora_alpha`   | 32          | Scaling factor                   |
+| `lora_dropout` | 0.05        | Regularization                   |
+| `bias`         | `none`      | Only LoRA parameters are trained |
+| `task_type`    | `CAUSAL_LM` | Causal language modeling         |
+Only **LoRA parameters** are trainable; all base model weights remain frozen.
+---
+## 📦 Dataset
+* **Type**: Instruction-formatted text dataset
+* **Format**: Each example contains a `text` field
+* **Tokenization**:
+  * Max length: 512
+  * Padding: `max_length`
+  * Truncation enabled
+Loss is computed using **standard causal language modeling**, meaning the model learns to predict the full sequence (instruction + response).
+---
+## 🚀 Adapter Training & Usage Pipeline
+### 1. Load tokenizer and model
+* Load Pythia tokenizer
+* Set `pad_token = eos_token`
+* Load model with 4-bit quantization
+### 2. Prepare for QLoRA training
+* Enable gradient checkpointing
+* Cast critical layers for numerical stability
+* Freeze base model parameters
+### 3. Apply LoRA adapters
+* Inject LoRA modules into attention and MLP layers
+* Print trainable parameter count
+### 4. Training configuration
+| Setting               | Value              |
+| --------------------- | ------------------ |
+| Epochs                | 3                  |
+| Batch size            | 6                  |
+| Gradient accumulation | 4                  |
+| Effective batch size  | 24                 |
+| Learning rate         | 2e-4               |
+| Optimizer             | `paged_adamw_8bit` |
+| Precision             | FP16               |
+### 5. Start
+```python
+```
+---
+## 📊 Why QLoRA?
+Compared to full fine-tuning:
+* ✅ ~10× lower GPU memory usage
+* ✅ Faster experimentation
+* ✅ No catastrophic forgetting
+* ✅ Easy adapter reuse and sharing
+This approach mirrors how many modern instruction-tuned LLMs are trained at scale.
+---
+## 📈 Expected Behavior When Using This Adapter
+After training, the model should:
+* Follow instructions more directly
+* Produce more structured and task-aligned responses
+* Show clear behavioral differences **with vs without** LoRA adapters
+Adapter ablation (disabling LoRA) should revert behavior close to the base model.
+---
+## 🔮 Possible Extensions
+* Mask loss to train **response-only instruction tuning**
+* Train multiple LoRA adapters for different tasks
+* Merge or switch adapters at inference time
+* Combine with evaluation datasets
+* Compare different LoRA ranks (`r=8`, `r=16`, `r=32`)
+---
+## 🛠️ Requirements
+* Python 3.9+
+* PyTorch
+* transformers
+* peft
+* bitsandbytes
+* accelerate
+---
+## 📜 License & Usage Notes
+This repository publishes **only LoRA adapter weights** and configuration files. The base model must be obtained separately under its original license.
+This adapter is intended for **research, experimentation, and non-production use** unless further evaluated.
+---
+This repository provides a **clean, minimal reference implementation** of QLoRA-based instruction tuning on a 1B-scale language model.