Gemma-2B Dolly QLoRA (LoRA Adapter)
This repository contains LoRA/QLoRA adapter weights fine-tuned from google/gemma-2b on a subset of Databricks Dolly 15k.
It is not a full model checkpoint — you load it on top of the base Gemma-2B model.
Important (Gemma is gated / restricted): You must accept Google’s Gemma terms to access the base model on Hugging Face, and downstream use/distribution must comply with those terms.
Model Details
- Adapter type: LoRA (PEFT)
- Base model:
google/gemma-2b(gated) - Training method: QLoRA (4-bit NF4) + LoRA adapters
- Language: English
- Files in this repo:
adapter_model.safetensors,adapter_config.json, tokenizer files,run_metadata.json
Intended Uses
Direct use
- Lightweight instruction-following adaptation for general assistant-style tasks (email writing, summarization, extraction, checklists).
Training Data
- Dataset:
databricks/databricks-dolly-15k - Filtering: only rows with empty
context(no-context subset) - Split:
test_size=0.05 - Subsampled for Colab-friendly run:
max_train_samples=2000,max_eval_samples=200
Training Procedure
Prompt format
Each example is formatted as:
Instruction:
{instruction}
Response:
{response}
An EOS token is appended during preprocessing.
Quantization (QLoRA)
- 4-bit quantization: NF4
- Double quantization: enabled
- Compute dtype: bf16 if supported, else fp16
LoRA configuration
LoRA is applied to common attention + MLP projection modules:
q_proj,k_proj,v_proj,o_projgate_proj,up_proj,down_proj
Hyperparameters
From run_metadata.json:
| Setting | Value |
|---|---|
| max_seq_length | 512 |
| num_train_epochs | 1.0 |
| per_device_train_batch_size | 1 |
| gradient_accumulation_steps | 4 |
| learning_rate | 2e-4 |
| warmup_steps | 10 |
| logging_steps | 10 |
| save_steps | 100 |
| LoRA r | 8 |
| LoRA alpha | 16 |
| LoRA dropout | 0.05 |
| optimizer | paged_adamw_8bit |
| seed | 42 |
Evaluation
Quantitative
From run_metadata.json:
- Train loss: 1.8468
- Eval loss: 1.8447
- Perplexity: 6.3264
Qualitative (prompt suite)
A small “before vs after” prompt suite is stored in:
run_metadata.json(baseline_outputsandafter_outputs)
In general, the adapter improves:
- Instruction adherence and cleaner formatting for short assistant tasks (emails, lists, extraction).
Known limitation observed in the same suite:
- May regress on coding-style prompts (always validate code outputs).
How to Use
Load base model + adapter
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
BASE_MODEL_ID = "google/gemma-2b"
ADAPTER_ID = "ash001/gemma-2b-dolly-qlora-adapter"
HF_TOKEN = os.environ.get("HF_TOKEN") # required for gated Gemma
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID, token=HF_TOKEN)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
base = AutoModelForCausalLM.from_pretrained(
BASE_MODEL_ID,
quantization_config=bnb_config if torch.cuda.is_available() else None,
device_map="auto" if torch.cuda.is_available() else None,
token=HF_TOKEN,
)
model = PeftModel.from_pretrained(base, ADAPTER_ID)
model.eval()
prompt = "Instruction:\nExtract action items from: 'Finalize the agenda, book the room, share notes by Friday.'\n\nResponse:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=128, temperature=0.7, top_p=0.9)
print(tokenizer.decode(out[0], skip_special_tokens=True))
License / Terms
- Base model:
google/gemma-2bis gated and subject to Google’s Gemma Terms of Use. - Dataset: Databricks Dolly 15k is licensed under CC BY-SA 3.0 (see dataset card).
- Adapter weights: provided under the Gemma license umbrella as a derivative intended for use with the base model.
Links
- Training project (notebooks): https://github.com/sparklerz/Gemma-2B-QLoRA-Adapter-Fine-Tuning
- W&B run: https://wandb.ai/kannansarat9/gemma-qlora
- Base model: https://huggingface.co/google/gemma-2b
- Dataset: https://huggingface.co/datasets/databricks/databricks-dolly-15k
- Downloads last month
- 16
Model tree for ash001/gemma-2b-dolly-qlora-adapter
Base model
google/gemma-2b