Gemma-2B Dolly QLoRA (LoRA Adapter)

This repository contains LoRA/QLoRA adapter weights fine-tuned from google/gemma-2b on a subset of Databricks Dolly 15k.
It is not a full model checkpoint — you load it on top of the base Gemma-2B model.

Important (Gemma is gated / restricted): You must accept Google’s Gemma terms to access the base model on Hugging Face, and downstream use/distribution must comply with those terms.

Model Details

Adapter type: LoRA (PEFT)
Base model: google/gemma-2b (gated)
Training method: QLoRA (4-bit NF4) + LoRA adapters
Language: English
Files in this repo: adapter_model.safetensors, adapter_config.json, tokenizer files, run_metadata.json

Intended Uses

Direct use

Lightweight instruction-following adaptation for general assistant-style tasks (email writing, summarization, extraction, checklists).

Training Data

Dataset: databricks/databricks-dolly-15k
Filtering: only rows with empty context (no-context subset)
Split: test_size=0.05
Subsampled for Colab-friendly run: max_train_samples=2000, max_eval_samples=200

Training Procedure

Prompt format

Each example is formatted as:

Instruction:
{instruction}

Response:
{response}

An EOS token is appended during preprocessing.

Quantization (QLoRA)

4-bit quantization: NF4
Double quantization: enabled
Compute dtype: bf16 if supported, else fp16

LoRA configuration

LoRA is applied to common attention + MLP projection modules:

q_proj, k_proj, v_proj, o_proj
gate_proj, up_proj, down_proj

Hyperparameters

From run_metadata.json:

Setting	Value
max_seq_length	512
num_train_epochs	1.0
per_device_train_batch_size	1
gradient_accumulation_steps	4
learning_rate	2e-4
warmup_steps	10
logging_steps	10
save_steps	100
LoRA r	8
LoRA alpha	16
LoRA dropout	0.05
optimizer	paged_adamw_8bit
seed	42

Evaluation

Quantitative

From run_metadata.json:

Train loss: 1.8468
Eval loss: 1.8447
Perplexity: 6.3264

Qualitative (prompt suite)

A small “before vs after” prompt suite is stored in:

run_metadata.json (baseline_outputs and after_outputs)

In general, the adapter improves:

Instruction adherence and cleaner formatting for short assistant tasks (emails, lists, extraction).

Known limitation observed in the same suite:

May regress on coding-style prompts (always validate code outputs).

How to Use

Load base model + adapter

import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

BASE_MODEL_ID = "google/gemma-2b"
ADAPTER_ID = "ash001/gemma-2b-dolly-qlora-adapter"
HF_TOKEN = os.environ.get("HF_TOKEN")  # required for gated Gemma

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID, token=HF_TOKEN)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

base = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    quantization_config=bnb_config if torch.cuda.is_available() else None,
    device_map="auto" if torch.cuda.is_available() else None,
    token=HF_TOKEN,
)

model = PeftModel.from_pretrained(base, ADAPTER_ID)
model.eval()

prompt = "Instruction:\nExtract action items from: 'Finalize the agenda, book the room, share notes by Friday.'\n\nResponse:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=128, temperature=0.7, top_p=0.9)

print(tokenizer.decode(out[0], skip_special_tokens=True))

License / Terms

Base model: google/gemma-2b is gated and subject to Google’s Gemma Terms of Use.
Dataset: Databricks Dolly 15k is licensed under CC BY-SA 3.0 (see dataset card).
Adapter weights: provided under the Gemma license umbrella as a derivative intended for use with the base model.

Model tree for ash001/gemma-2b-dolly-qlora-adapter

Base model

google/gemma-2b

Adapter

(23696)

this model

ash001
/

gemma-2b-dolly-qlora-adapter