BEncoderRT commited on
Commit
8e9784f
ยท
verified ยท
1 Parent(s): 87045bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +167 -13
README.md CHANGED
@@ -1,14 +1,168 @@
 
 
 
 
 
 
1
  ---
2
- license: mit
3
- datasets:
4
- - databricks/databricks-dolly-15k
5
- language:
6
- - en
7
- base_model:
8
- - EleutherAI/pythia-1b-deduped
9
- pipeline_tag: text-generation
10
- tags:
11
- - peft
12
- - Lora
13
- - Instruction-Tuning
14
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # QLoRA Instruction Tuning on Pythia-1B
2
+
3
+ This repository provides a **Hugging Faceโ€“compatible LoRA adapter** trained via **QLoRA (4-bit quantization + LoRA adapters)** on the **EleutherAI Pythia-1B-deduped** base model.
4
+
5
+ The project focuses on **producing and publishing a reusable LoRA adapter** using a modern, memory-efficient instruction-tuning pipeline built with Hugging Face Transformers, PEFT, and BitsAndBytes. It is designed for **learning, experimentation, and small-GPU environments (e.g. Colab)**.
6
+
7
  ---
8
+
9
+ ## โœจ Key Features (Adapter-Centric)
10
+
11
+ * ๐Ÿ”’ **Frozen base model**: Pythia-1B-deduped (not included in this repository)
12
+ * ๐Ÿง  **QLoRA training** with 4-bit NF4 quantization
13
+ * ๐Ÿงฉ **LoRA adapters only** are trainable (<1% parameters)
14
+ * ๐Ÿ’พ Optimized for **low GPU memory usage**
15
+ * ๐Ÿ“š Clear, minimal pipeline for understanding instruction tuning
16
+
17
+ ---
18
+
19
+ ## ๐Ÿง  What This Adapter Represents
20
+
21
+ This adapter demonstrates how to:
22
+
23
+ * Load a **4-bit quantized causal language model**
24
+ * Prepare it for k-bit training
25
+ * Apply **LoRA adapters** for parameter-efficient fine-tuning
26
+ * Perform **instruction tuning** using causal LM loss
27
+ * Train using the Hugging Face `Trainer` API
28
+
29
+ Formally, training follows:
30
+
31
+ ```
32
+ Frozen Base Model (4-bit)
33
+ + Trainable LoRA ฮ”W
34
+ โ†’ Instruction-following behavior
35
+ ```
36
+
37
+ ---
38
+
39
+ ## ๐Ÿ—๏ธ Model & Training Setup
40
+
41
+ ### Base Model
42
+
43
+ * **Model**: `EleutherAI/pythia-1B-deduped`
44
+ * **Architecture**: Decoder-only Transformer
45
+ * **Quantization**: 4-bit NF4 (BitsAndBytes)
46
+
47
+ ### LoRA Configuration
48
+
49
+ | Parameter | Value | Description |
50
+ | -------------- | ----------- | -------------------------------- |
51
+ | `r` | 32 | LoRA rank (expressiveness) |
52
+ | `lora_alpha` | 32 | Scaling factor |
53
+ | `lora_dropout` | 0.05 | Regularization |
54
+ | `bias` | `none` | Only LoRA parameters are trained |
55
+ | `task_type` | `CAUSAL_LM` | Causal language modeling |
56
+
57
+ Only **LoRA parameters** are trainable; all base model weights remain frozen.
58
+
59
+ ---
60
+
61
+ ## ๐Ÿ“ฆ Dataset
62
+
63
+ * **Type**: Instruction-formatted text dataset
64
+ * **Format**: Each example contains a `text` field
65
+ * **Tokenization**:
66
+
67
+ * Max length: 512
68
+ * Padding: `max_length`
69
+ * Truncation enabled
70
+
71
+ Loss is computed using **standard causal language modeling**, meaning the model learns to predict the full sequence (instruction + response).
72
+
73
+ ---
74
+
75
+ ## ๐Ÿš€ Adapter Training & Usage Pipeline
76
+
77
+ ### 1. Load tokenizer and model
78
+
79
+ * Load Pythia tokenizer
80
+ * Set `pad_token = eos_token`
81
+ * Load model with 4-bit quantization
82
+
83
+ ### 2. Prepare for QLoRA training
84
+
85
+ * Enable gradient checkpointing
86
+ * Cast critical layers for numerical stability
87
+ * Freeze base model parameters
88
+
89
+ ### 3. Apply LoRA adapters
90
+
91
+ * Inject LoRA modules into attention and MLP layers
92
+ * Print trainable parameter count
93
+
94
+ ### 4. Training configuration
95
+
96
+ | Setting | Value |
97
+ | --------------------- | ------------------ |
98
+ | Epochs | 3 |
99
+ | Batch size | 6 |
100
+ | Gradient accumulation | 4 |
101
+ | Effective batch size | 24 |
102
+ | Learning rate | 2e-4 |
103
+ | Optimizer | `paged_adamw_8bit` |
104
+ | Precision | FP16 |
105
+
106
+ ### 5. Start
107
+
108
+ ```python
109
+
110
+ ```
111
+
112
+ ---
113
+
114
+ ## ๐Ÿ“Š Why QLoRA?
115
+
116
+ Compared to full fine-tuning:
117
+
118
+ * โœ… ~10ร— lower GPU memory usage
119
+ * โœ… Faster experimentation
120
+ * โœ… No catastrophic forgetting
121
+ * โœ… Easy adapter reuse and sharing
122
+
123
+ This approach mirrors how many modern instruction-tuned LLMs are trained at scale.
124
+
125
+ ---
126
+
127
+ ## ๐Ÿ“ˆ Expected Behavior When Using This Adapter
128
+
129
+ After training, the model should:
130
+
131
+ * Follow instructions more directly
132
+ * Produce more structured and task-aligned responses
133
+ * Show clear behavioral differences **with vs without** LoRA adapters
134
+
135
+ Adapter ablation (disabling LoRA) should revert behavior close to the base model.
136
+
137
+ ---
138
+
139
+ ## ๐Ÿ”ฎ Possible Extensions
140
+
141
+ * Mask loss to train **response-only instruction tuning**
142
+ * Train multiple LoRA adapters for different tasks
143
+ * Merge or switch adapters at inference time
144
+ * Combine with evaluation datasets
145
+ * Compare different LoRA ranks (`r=8`, `r=16`, `r=32`)
146
+
147
+ ---
148
+
149
+ ## ๐Ÿ› ๏ธ Requirements
150
+
151
+ * Python 3.9+
152
+ * PyTorch
153
+ * transformers
154
+ * peft
155
+ * bitsandbytes
156
+ * accelerate
157
+
158
+ ---
159
+
160
+ ## ๐Ÿ“œ License & Usage Notes
161
+
162
+ This repository publishes **only LoRA adapter weights** and configuration files. The base model must be obtained separately under its original license.
163
+
164
+ This adapter is intended for **research, experimentation, and non-production use** unless further evaluated.
165
+
166
+ ---
167
+
168
+ This repository provides a **clean, minimal reference implementation** of QLoRA-based instruction tuning on a 1B-scale language model.